@c ***************************************************************************
@node GNUnet Developer Handbook
@chapter GNUnet Developer Handbook

This book is intended to be an introduction for programmers that want to
extend the GNUnet framework. GNUnet is more than a simple peer-to-peer
application. For developers, GNUnet is:

@itemize @bullet
@item Free software under the GNU General Public License, with a community
that believes in the GNU philosophy
@item
A set of standards, including coding conventions and architectural rules
@item
A set of layered protocols, both specifying the communication between peers as
well as the communication between components of a single peer.
@item
A set of libraries with well-defined APIs suitable for writing extensions
@end itemize

In particular, the architecture specifies that a peer consists of many
processes communicating via protocols. Processes can be written in almost
any language. C and Java APIs exist for accessing existing services and for
writing extensions. It is possible to write extensions in other languages by
implementing the necessary IPC protocols.

GNUnet can be extended and improved along many possible dimensions, and anyone
interested in free software and freedom-enhancing networking is welcome to
join the effort. This developer handbook attempts to provide an initial
introduction to some of the key design choices and central components of the
system. This manual is far from complete, and we welcome informed
contributions, be it in the form of new chapters or insightful comments.

However, the website is experiencing a constant onslaught of sophisticated
link-spam entered manually by exploited workers solving puzzles and
customizing text. To limit this commercial defacement, we are strictly
moderating comments and have disallowed "normal" users from posting new
content. However, this is really only intended to keep the spam at bay. If
you are a real user or aspiring developer, please drop us a note (IRC, e-mail,
contact form) with your user profile ID number included. We will then relax
these restrictions on your account. We're sorry for this inconvenience;
however, few people would want to read this site if 99% of it was
advertisements for bogus websites.



@c ***************************************************************************








@menu
* Developer Introduction::
* Code overview::
* System Architecture::
* Subsystem stability::
* Naming conventions and coding style guide::
* Build-system::
* Developing extensions for GNUnet using the gnunet-ext template::
* Writing testcases::
* GNUnet's TESTING library::
* Performance regression analysis with Gauger::
* GNUnet's TESTBED Subsystem::
* libgnunetutil::
* The Automatic Restart Manager (ARM)::
* GNUnet's TRANSPORT Subsystem::
* NAT library::
* Distance-Vector plugin::
* SMTP plugin::
* Bluetooth plugin::
* WLAN plugin::
* The ATS Subsystem::
* GNUnet's CORE Subsystem::
* GNUnet's CADET subsystem::
* GNUnet's NSE subsystem::
* GNUnet's HOSTLIST subsystem::
* GNUnet's IDENTITY subsystem::
* GNUnet's NAMESTORE Subsystem::
* GNUnet's PEERINFO subsystem::
* GNUnet's PEERSTORE subsystem::
* GNUnet's SET Subsystem::
* GNUnet's STATISTICS subsystem::
* GNUnet's Distributed Hash Table (DHT)::
* The GNU Name System (GNS)::
* The GNS Namecache::
* The REVOCATION Subsystem::
* GNUnet's File-sharing (FS) Subsystem::
* GNUnet's REGEX Subsystem::
@end menu

@node Developer Introduction
@section Developer Introduction

This developer handbook is intended as first introduction to GNUnet for new
developers that want to extend the GNUnet framework. After the introduction,
each of the GNUnet subsystems (directories in the @file{src/} tree) is (supposed to
be) covered in its own chapter. In addition to this documentation, GNUnet
developers should be aware of the services available on the GNUnet server to
them.

New developers can have a look a the GNUnet tutorials for C and java available
in the @file{src/} directory of the repository or under the following links:

@c ** FIXME: Link to files in source, not online.
@c ** FIXME: Where is the Java tutorial?
@itemize @bullet
@item @uref{https://gnunet.org/git/gnunet.git/plain/doc/gnunet-c-tutorial.pdf, GNUnet C tutorial}
@item GNUnet Java tutorial
@end itemize

In addition to this book, the GNUnet server contains various resources for
GNUnet developers. They are all conveniently reachable via the "Developer"
entry in the navigation menu. Some additional tools (such as static analysis
reports) require a special developer access to perform certain operations. If
you feel you need access, you should contact
@uref{http://grothoff.org/christian/, Christian Grothoff}, GNUnet's maintainer.

The public subsystems on the GNUnet server that help developers are:

@itemize @bullet
@item The Version control system keeps our code and enables distributed
development. Only developers with write access can commit code, everyone else
is encouraged to submit patches to the
@uref{https://lists.gnu.org/mailman/listinfo/gnunet-developers, GNUnet-developers mailinglist}.
@item The GNUnet bugtracking system is used to track feature requests, open bug
reports and their resolutions. Anyone can report bugs, only developers can
claim to have fixed them.
@item A buildbot is used to check GNUnet builds automatically on a range of
platforms. Builds are triggered automatically after 30 minutes of no changes to
Git.
@item The current quality of our automated test suite is assessed using Code
coverage analysis. This analysis is run daily; however the webpage is only
updated if all automated tests pass at that time. Testcases that improve our
code coverage are always welcome.
@item We try to automatically find bugs using a static analysis scan. This scan
is run daily; however the webpage is only updated if all automated tests pass
at the time. Note that not everything that is flagged by the analysis is a bug,
sometimes even good code can be marked as possibly problematic. Nevertheless,
developers are encouraged to at least be aware of all issues in their code that
are listed.
@item We use Gauger for automatic performance regression visualization. Details
on how to use Gauger are here.
@item We use @uref{http://junit.org/, junit} to automatically test gnunet-java.
Automatically generated, current reports on the test suite are here.
@item We use Cobertura to generate test coverage reports for gnunet-java.
Current reports on test coverage are here.
@end itemize



@c ***************************************************************************
@menu
* Project overview::
@end menu

@node Project overview
@subsection Project overview

The GNUnet project consists at this point of several sub-projects. This section
is supposed to give an initial overview about the various sub-projects. Note
that this description also lists projects that are far from complete, including
even those that have literally not a single line of code in them yet.

GNUnet sub-projects in order of likely relevance are currently:

@table @asis

@item gnunet Core of the P2P framework, including file-sharing, VPN and
chat applications; this is what the developer handbook covers mostly
@item gnunet-gtk Gtk+-based user interfaces, including gnunet-fs-gtk
(file-sharing), gnunet-statistics-gtk (statistics over time),
gnunet-peerinfo-gtk (information about current connections and known peers),
gnunet-chat-gtk (chat GUI) and gnunet-setup (setup tool for "everything")
@item gnunet-fuse Mounting directories shared via GNUnet's file-sharing on Linux
@item gnunet-update Installation and update tool
@item gnunet-ext Template for starting 'external' GNUnet projects
@item gnunet-java Java APIs for writing GNUnet services and applications
@c ** FIXME: Point to new website repository once we have it:
@c ** @item svn/gnunet-www/ Code and media helping drive the GNUnet website
@item eclectic Code to run
GNUnet nodes on testbeds for research, development, testing and evaluation
@c ** FIXME: Solve the status and location of gnunet-qt
@item gnunet-qt qt-based GNUnet GUI (dead?)
@item gnunet-cocoa cocoa-based GNUnet GUI (dead?)

@end table

We are also working on various supporting libraries and tools:
@c ** FIXME: What about gauger, and what about libmwmodem?

@table @asis
@item libextractor GNU libextractor (meta data extraction)
@item libmicrohttpd GNU libmicrohttpd (embedded HTTP(S) server library)
@item gauger Tool for performance regression analysis
@item monkey Tool for automated debugging of distributed systems
@item libmwmodem Library for accessing satellite connection quality reports
@end table

Finally, there are various external projects (see links for a list of those
that have a public website) which build on top of the GNUnet framework.

@c ***************************************************************************
@node Code overview
@section Code overview

This section gives a brief overview of the GNUnet source code. Specifically, we
sketch the function of each of the subdirectories in the @file{gnunet/src/}
directory. The order given is roughly bottom-up (in terms of the layers of the
system).
@table @asis

@item util/ --- libgnunetutil Library with general utility functions, all
GNUnet binaries link against this library. Anything from memory allocation and
data structures to cryptography and inter-process communication. The goal is to
provide an OS-independent interface and more 'secure' or convenient
implementations of commonly used primitives. The API is spread over more than a
dozen headers, developers should study those closely to avoid duplicating
existing functions.
@item hello/ --- libgnunethello HELLO messages are used to
describe under which addresses a peer can be reached (for example, protocol,
IP, port). This library manages parsing and generating of HELLO messages.
@item block/ --- libgnunetblock The DHT and other components of GNUnet store
information in units called 'blocks'. Each block has a type and the type
defines a particular format and how that binary format is to be linked to a
hash code (the key for the DHT and for databases). The block library is a
wapper around block plugins which provide the necessary functions for each
block type.
@item statistics/ The statistics service enables associating
values (of type uint64_t) with a componenet name and a string. The main uses is
debugging (counting events), performance tracking and user entertainment (what
did my peer do today?).
@item arm/ The automatic-restart-manager (ARM) service
is the GNUnet master service. Its role is to start gnunet-services, to re-start
them when they crashed and finally to shut down the system when requested.
@item peerinfo/ The peerinfo service keeps track of which peers are known to
the local peer and also tracks the validated addresses for each peer (in the
form of a HELLO message) for each of those peers. The peer is not necessarily
connected to all peers known to the peerinfo service. Peerinfo provides
persistent storage for peer identities --- peers are not forgotten just because
of a system restart.
@item datacache/ --- libgnunetdatacache The datacache
library provides (temporary) block storage for the DHT. Existing plugins can
store blocks in Sqlite, Postgres or MySQL databases. All data stored in the
cache is lost when the peer is stopped or restarted (datacache uses temporary
tables).
@item datastore/ The datastore service stores file-sharing blocks in
databases for extended periods of time. In contrast to the datacache, data is
not lost when peers restart. However, quota restrictions may still cause old,
expired or low-priority data to be eventually discarded. Existing plugins can
store blocks in Sqlite, Postgres or MySQL databases.
@item template/ Template
for writing a new service. Does nothing.
@item ats/ The automatic transport
selection (ATS) service is responsible for deciding which address (i.e. which
transport plugin) should be used for communication with other peers, and at
what bandwidth.
@item nat/ --- libgnunetnat Library that provides basic
functions for NAT traversal. The library supports NAT traversal with manual
hole-punching by the user, UPnP and ICMP-based autonomous NAT traversal. The
library also includes an API for testing if the current configuration works and
the @code{gnunet-nat-server} which provides an external service to test the
local configuration.
@item fragmentation/ --- libgnunetfragmentation Some
transports (UDP and WLAN, mostly) have restrictions on the maximum transfer
unit (MTU) for packets. The fragmentation library can be used to break larger
packets into chunks of at most 1k and transmit the resulting fragments
reliabily (with acknowledgement, retransmission, timeouts, etc.).
@item transport/ The transport service is responsible for managing the basic P2P
communication. It uses plugins to support P2P communication over TCP, UDP,
HTTP, HTTPS and other protocols.The transport service validates peer addresses,
enforces bandwidth restrictions, limits the total number of connections and
enforces connectivity restrictions (i.e. friends-only).
@item peerinfo-tool/
This directory contains the gnunet-peerinfo binary which can be used to inspect
the peers and HELLOs known to the peerinfo service.
@item core/ The core
service is responsible for establishing encrypted, authenticated connections
with other peers, encrypting and decrypting messages and forwarding messages to
higher-level services that are interested in them.
@item testing/ ---
libgnunettesting The testing library allows starting (and stopping) peers for
writing testcases.@
It also supports automatic generation of configurations for
peers ensuring that the ports and paths are disjoint. libgnunettesting is also
the foundation for the testbed service
@item testbed/ The testbed service is
used for creating small or large scale deployments of GNUnet peers for
evaluation of protocols. It facilitates peer depolyments on multiple hosts (for
example, in a cluster) and establishing varous network topologies (both
underlay and overlay).
@item nse/ The network size estimation (NSE) service
implements a protocol for (securely) estimating the current size of the P2P
network.
@item dht/ The distributed hash table (DHT) service provides a
distributed implementation of a hash table to store blocks under hash keys in
the P2P network.
@item hostlist/ The hostlist service allows learning about
other peers in the network by downloading HELLO messages from an HTTP server,
can be configured to run such an HTTP server and also implements a P2P protocol
to advertise and automatically learn about other peers that offer a public
hostlist server.
@item topology/ The topology service is responsible for
maintaining the mesh topology. It tries to maintain connections to friends
(depending on the configuration) and also tries to ensure that the peer has a
decent number of active connections at all times. If necessary, new connections
are added. All peers should run the topology service, otherwise they may end up
not being connected to any other peer (unless some other service ensures that
core establishes the required connections). The topology service also tells the
transport service which connections are permitted (for friend-to-friend
networking)
@item fs/ The file-sharing (FS) service implements GNUnet's
file-sharing application. Both anonymous file-sharing (using gap) and
non-anonymous file-sharing (using dht) are supported.
@item cadet/ The CADET
service provides a general-purpose routing abstraction to create end-to-end
encrypted tunnels in mesh networks. We wrote a paper documenting key aspects of
the design.
@item tun/ --- libgnunettun Library for building IPv4, IPv6
packets and creating checksums for UDP, TCP and ICMP packets. The header
defines C structs for common Internet packet formats and in particular structs
for interacting with TUN (virtual network) interfaces.
@item mysql/ ---
libgnunetmysql Library for creating and executing prepared MySQL statements and
to manage the connection to the MySQL database. Essentially a lightweight
wrapper for the interaction between GNUnet components and libmysqlclient.
@item dns/ Service that allows intercepting and modifying DNS requests of the
local machine. Currently used for IPv4-IPv6 protocol translation (DNS-ALG) as
implemented by "pt/" and for the GNUnet naming system. The service can also be
configured to offer an exit service for DNS traffic.
@item vpn/ The virtual
public network (VPN) service provides a virtual tunnel interface (VTUN) for IP
routing over GNUnet. Needs some other peers to run an "exit" service to work.
Can be activated using the "gnunet-vpn" tool or integrated with DNS using the
"pt" daemon.
@item exit/ Daemon to allow traffic from the VPN to exit this
peer to the Internet or to specific IP-based services of the local peer.
Currently, an exit service can only be restricted to IPv4 or IPv6, not to
specific ports and or IP address ranges. If this is not acceptable, additional
firewall rules must be added manually. exit currently only works for normal
UDP, TCP and ICMP traffic; DNS queries need to leave the system via a DNS
service.
@item pt/ protocol translation daemon. This daemon enables 4-to-6,
6-to-4, 4-over-6 or 6-over-4 transitions for the local system. It essentially
uses "DNS" to intercept DNS replies and then maps results to those offered by
the VPN, which then sends them using mesh to some daemon offering an
appropriate exit service.
@item identity/ Management of egos (alter egos) of a
user; identities are essentially named ECC private keys and used for zones in
the GNU name system and for namespaces in file-sharing, but might find other
uses later
@item revocation/ Key revocation service, can be used to revoke the
private key of an identity if it has been compromised
@item namecache/ Cache
for resolution results for the GNU name system; data is encrypted and can be
shared among users, loss of the data should ideally only result in a
performance degradation (persistence not required)
@item namestore/ Database
for the GNU name system with per-user private information, persistence required
@item gns/ GNU name system, a GNU approach to DNS and PKI.
@item dv/ A plugin
for distance-vector (DV)-based routing. DV consists of a service and a
transport plugin to provide peers with the illusion of a direct P2P connection
for connections that use multiple (typically up to 3) hops in the actual
underlay network.
@item regex/ Service for the (distributed) evaluation of
regular expressions.
@item scalarproduct/ The scalar product service offers an
API to perform a secure multiparty computation which calculates a scalar
product between two peers without exposing the private input vectors of the
peers to each other.
@item consensus/ The consensus service will allow a set
of peers to agree on a set of values via a distributed set union computation.
@item rest/ The rest API allows access to GNUnet services using RESTful
interaction. The services provide plugins that can exposed by the rest server.
@item experimentation/ The experimentation daemon coordinates distributed
experimentation to evaluate transport and ats properties
@end table

@c ***************************************************************************
@node System Architecture
@section System Architecture

GNUnet developers like legos. The blocks are indestructible, can be stacked
together to construct complex buildings and it is generally easy to swap one
block for a different one that has the same shape. GNUnet's architecture is
based on legos:



This chapter documents the GNUnet lego system, also known as GNUnet's system
architecture.

The most common GNUnet component is a service. Services offer an API (or
several, depending on what you count as "an API") which is implemented as a
library. The library communicates with the main process of the service using a
service-specific network protocol. The main process of the service typically
doesn't fully provide everything that is needed --- it has holes to be filled
by APIs to other services.

A special kind of component in GNUnet are user interfaces and daemons. Like
services, they have holes to be filled by APIs of other services. Unlike
services, daemons do not implement their own network protocol and they have no
API:

The GNUnet system provides a range of services, daemons and user interfaces,
which are then combined into a layered GNUnet instance (also known as a peer).

Note that while it is generally possible to swap one service for another
compatible service, there is often only one implementation. However, during
development we often have a "new" version of a service in parallel with an
"old" version. While the "new" version is not working, developers working on
other parts of the service can continue their development by simply using the
"old" service. Alternative design ideas can also be easily investigated by
swapping out individual components. This is typically achieved by simply
changing the name of the "BINARY" in the respective configuration section.

Key properties of GNUnet services are that they must be separate processes and
that they must protect themselves by applying tight error checking against the
network protocol they implement (thereby achieving a certain degree of
robustness).

On the other hand, the APIs are implemented to tolerate failures of the
service, isolating their host process from errors by the service. If the
service process crashes, other services and daemons around it should not also
fail, but instead wait for the service process to be restarted by ARM.


@c ***************************************************************************
@node Subsystem stability
@section Subsystem stability

This page documents the current stability of the various GNUnet subsystems.
Stability here describes the expected degree of compatibility with future
versions of GNUnet. For each subsystem we distinguish between compatibility on
the P2P network level (communication protocol between peers), the IPC level
(communication between the service and the service library) and the API level
(stability of the API). P2P compatibility is relevant in terms of which
applications are likely going to be able to communicate with future versions of
the network. IPC communication is relevant for the implementation of language
bindings that re-implement the IPC messages. Finally, API compatibility is
relevant to developers that hope to be able to avoid changes to applications
build on top of the APIs of the framework.

The following table summarizes our current view of the stability of the
respective protocols or APIs:

@multitable @columnfractions .20 .20 .20 .20
@headitem Subsystem @tab P2P @tab IPC @tab C API
@item util @tab n/a @tab n/a @tab stable
@item arm @tab n/a @tab stable @tab stable
@item ats @tab n/a @tab unstable @tab testing
@item block @tab n/a @tab n/a @tab stable
@item cadet @tab testing @tab testing @tab testing
@item consensus @tab experimental @tab experimental @tab experimental
@item core @tab stable @tab stable @tab stable
@item datacache @tab n/a @tab n/a @tab stable
@item datastore @tab n/a @tab stable @tab stable
@item dht @tab stable @tab stable @tab stable
@item dns @tab stable @tab stable @tab stable
@item dv @tab testing @tab testing @tab n/a
@item exit @tab testing @tab n/a @tab n/a
@item fragmentation @tab stable @tab n/a @tab stable
@item fs @tab stable @tab stable @tab stable
@item gns @tab stable @tab stable @tab stable
@item hello @tab n/a @tab n/a @tab testing
@item hostlist @tab stable @tab stable @tab n/a
@item identity @tab stable @tab stable @tab n/a
@item multicast @tab experimental @tab experimental @tab experimental
@item mysql @tab stable @tab n/a @tab stable
@item namestore @tab n/a @tab stable @tab stable
@item nat @tab n/a @tab n/a @tab stable
@item nse @tab stable @tab stable @tab stable
@item peerinfo @tab n/a @tab stable @tab stable
@item psyc @tab experimental @tab experimental @tab experimental
@item pt @tab n/a @tab n/a @tab n/a
@item regex @tab stable @tab stable @tab stable
@item revocation @tab stable @tab stable @tab stable
@item social @tab experimental @tab experimental @tab experimental
@item statistics @tab n/a @tab stable @tab stable
@item testbed @tab n/a @tab testing @tab testing
@item testing @tab n/a @tab n/a @tab testing
@item topology @tab n/a @tab n/a @tab n/a
@item transport @tab stable @tab stable @tab stable
@item tun @tab n/a @tab n/a @tab stable
@item vpn @tab testing @tab n/a @tab n/a
@end multitable

Here is a rough explanation of the values:

@table @samp
@item stable
No incompatible changes are planned at this time; for IPC/APIs, if
there are incompatible changes, they will be minor and might only require
minimal changes to existing code; for P2P, changes will be avoided if at all
possible for the 0.10.x-series

@item testing
No incompatible changes are
planned at this time, but the code is still known to be in flux; so while we
have no concrete plans, our expectation is that there will still be minor
modifications; for P2P, changes will likely be extensions that should not break
existing code

@item unstable
Changes are planned and will happen; however, they
will not be totally radical and the result should still resemble what is there
now; nevertheless, anticipated changes will break protocol/API compatibility

@item experimental
Changes are planned and the result may look nothing like
what the API/protocol looks like today

@item unknown
Someone should think about where this subsystem headed

@item n/a
This subsystem does not have an API/IPC-protocol/P2P-protocol
@end table

@c ***************************************************************************
@node Naming conventions and coding style guide
@section Naming conventions and coding style guide

Here you can find some rules to help you write code for GNUnet.



@c ***************************************************************************
@menu
* Naming conventions::
* Coding style::
@end menu

@node Naming conventions
@subsection Naming conventions


@c ***************************************************************************
@menu
* include files::
* binaries::
* logging::
* configuration::
* exported symbols::
* private (library-internal) symbols (including structs and macros)::
* testcases::
* performance tests::
* src/ directories::
@end menu

@node include files
@subsubsection include files

@itemize @bullet
@item _lib: library without need for a process
@item _service: library that needs a service process
@item _plugin: plugin definition
@item _protocol: structs used in network protocol
@item exceptions:
@itemize @bullet
@item gnunet_config.h --- generated
@item platform.h --- first included
@item plibc.h --- external library
@item gnunet_common.h --- fundamental routines
@item gnunet_directories.h --- generated
@item gettext.h --- external library
@end itemize
@end itemize

@c ***************************************************************************
@node binaries
@subsubsection binaries

@itemize @bullet
@item gnunet-service-xxx: service process (has listen socket)
@item gnunet-daemon-xxx: daemon process (no listen socket)
@item gnunet-helper-xxx[-yyy]: SUID helper for module xxx
@item gnunet-yyy: command-line tool for end-users
@item libgnunet_plugin_xxx_yyy.so: plugin for API xxx
@item libgnunetxxx.so: library for API xxx
@end itemize

@c ***************************************************************************
@node logging
@subsubsection logging

@itemize @bullet
@item services and daemons use their directory name in GNUNET_log_setup (i.e.
'core') and log using plain 'GNUNET_log'.
@item command-line tools use their full name in GNUNET_log_setup (i.e.
'gnunet-publish') and log using plain 'GNUNET_log'.
@item service access libraries log using 'GNUNET_log_from' and use
'DIRNAME-api' for the component (i.e. 'core-api')
@item pure libraries (without associated service) use 'GNUNET_log_from' with
the component set to their library name (without lib or '.so'), which should
also be their directory name (i.e. 'nat')
@item plugins should use 'GNUNET_log_from' with the directory name and the
plugin name combined to produce the component name (i.e. 'transport-tcp').
@item logging should be unified per-file by defining a LOG macro with the
appropriate arguments, along these lines:@ #define LOG(kind,...)
GNUNET_log_from (kind, "example-api",__VA_ARGS__)
@end itemize

@c ***************************************************************************
@node configuration
@subsubsection configuration

@itemize @bullet
@item paths (that are substituted in all filenames) are in PATHS (have as few
as possible)
@item all options for a particular module (src/MODULE) are under [MODULE]
@item options for a plugin of a module are under [MODULE-PLUGINNAME]
@end itemize

@c ***************************************************************************
@node exported symbols
@subsubsection exported symbols

@itemize @bullet
@item must start with "GNUNET_modulename_" and be defined in "modulename.c"
@item exceptions: those defined in gnunet_common.h
@end itemize

@c ***************************************************************************
@node private (library-internal) symbols (including structs and macros)
@subsubsection private (library-internal) symbols (including structs and macros)

@itemize @bullet
@item must NOT start with any prefix
@item must not be exported in a way that linkers could use them or@ other
libraries might see them via headers; they must be either@ declared/defined in
C source files or in headers that are in@ the respective directory under
src/modulename/ and NEVER be@ declared in src/include/.
@end itemize

@node testcases
@subsubsection testcases

@itemize @bullet
@item must be called "test_module-under-test_case-description.c"
@item "case-description" maybe omitted if there is only one test
@end itemize

@c ***************************************************************************
@node performance tests
@subsubsection performance tests

@itemize @bullet
@item must be called "perf_module-under-test_case-description.c"
@item "case-description" maybe omitted if there is only one performance test
@item Must only be run if HAVE_BENCHMARKS is satisfied
@end itemize

@c ***************************************************************************
@node src/ directories
@subsubsection src/ directories

@itemize @bullet
@item gnunet-NAME: end-user applications (i.e., gnunet-search, gnunet-arm)
@item gnunet-service-NAME: service processes with accessor library (i.e.,
gnunet-service-arm)
@item libgnunetNAME: accessor library (_service.h-header) or standalone library
(_lib.h-header)
@item gnunet-daemon-NAME: daemon process without accessor library (i.e.,
gnunet-daemon-hostlist) and no GNUnet management port
@item libgnunet_plugin_DIR_NAME: loadable plugins (i.e.,
libgnunet_plugin_transport_tcp)
@end itemize

@c ***************************************************************************
@node Coding style
@subsection Coding style

@itemize @bullet
@item GNU guidelines generally apply
@item Indentation is done with spaces, two per level, no tabs
@item C99 struct initialization is fine
@item declare only one variable per line, so@

@example
int i; int j;
@end example

instead of

@example
int i,j;
@end example

This helps keep diffs small and forces developers to think precisely about the
type of every variable. Note that @code{char *} is different from @code{const
char*} and @code{int} is different from @code{unsigned int} or @code{uint32_t}.
Each variable type should be chosen with care.

@item While @code{goto} should generally be avoided, having a @code{goto} to
the end of a function to a block of clean up statements (free, close, etc.) can
be acceptable.

@item Conditions should be written with constants on the left (to avoid
accidental assignment) and with the 'true' target being either the 'error' case
or the significantly simpler continuation. For example:@

@example
if (0 != stat ("filename," &sbuf)) @{ error(); @} else @{
  /* handle normal case here */
@}
@end example


instead of
@example
if (stat ("filename," &sbuf) == 0) @{
  /* handle normal case here */
@} else @{ error(); @}
@end example


If possible, the error clause should be terminated with a 'return' (or 'goto'
to some cleanup routine) and in this case, the 'else' clause should be omitted:
@example
if (0 != stat ("filename," &sbuf)) @{ error(); return; @}
/* handle normal case here */
@end example


This serves to avoid deep nesting. The 'constants on the left' rule applies to
all constants (including. @code{GNUNET_SCHEDULER_NO_TASK}), NULL, and enums).
With the two above rules (constants on left, errors in 'true' branch), there is
only one way to write most branches correctly.

@item Combined assignments and tests are allowed if they do not hinder code
clarity. For example, one can write:@

@example
if (NULL == (value = lookup_function())) @{ error(); return; @}
@end example


@item Use @code{break} and @code{continue} wherever possible to avoid deep(er)
nesting. Thus, we would write:@

@example
next = head; while (NULL != (pos = next)) @{ next = pos->next; if (!
should_free (pos)) continue; GNUNET_CONTAINER_DLL_remove (head, tail, pos);
GNUNET_free (pos); @}
@end example


instead of
@example
next = head; while (NULL != (pos = next)) @{ next =
pos->next; if (should_free (pos)) @{
    /* unnecessary nesting! */
    GNUNET_CONTAINER_DLL_remove (head, tail, pos); GNUNET_free (pos); @} @}
@end example


@item We primarily use @code{for} and @code{while} loops. A @code{while} loop
is used if the method for advancing in the loop is not a straightforward
increment operation. In particular, we use:@

@example
next = head;
while (NULL != (pos = next))
@{
  next = pos->next;
  if (! should_free (pos))
    continue;
  GNUNET_CONTAINER_DLL_remove (head, tail, pos);
  GNUNET_free (pos);
@}
@end example


to free entries in a list (as the iteration changes the structure of the list
due to the free; the equivalent @code{for} loop does no longer follow the
simple @code{for} paradigm of @code{for(INIT;TEST;INC)}). However, for loops
that do follow the simple @code{for} paradigm we do use @code{for}, even if it
involves linked lists:
@example
/* simple iteration over a linked list */
for (pos = head; NULL != pos; pos = pos->next)
@{
   use (pos);
@}
@end example


@item The first argument to all higher-order functions in GNUnet must be
declared to be of type @code{void *} and is reserved for a closure. We do not
use inner functions, as trampolines would conflict with setups that use
non-executable stacks.@ The first statement in a higher-order function, which
unusually should be part of the variable declarations, should assign the
@code{cls} argument to the precise expected type. For example:
@example
int callback (void *cls, char *args) @{
  struct Foo *foo = cls; int other_variables;

   /* rest of function */
@}
@end example


@item It is good practice to write complex @code{if} expressions instead of
using deeply nested @code{if} statements. However, except for addition and
multiplication, all operators should use parens. This is fine:@

@example
if ( (1 == foo) || ((0 == bar) && (x != y)) )
  return x;
@end example


However, this is not:
@example
if (1 == foo)
  return x;
if (0 == bar && x != y)
  return x;
@end example


Note that splitting the @code{if} statement above is debateable as the
@code{return x} is a very trivial statement. However, once the logic after the
branch becomes more complicated (and is still identical), the "or" formulation
should be used for sure.

@item There should be two empty lines between the end of the function and the
comments describing the following function. There should be a single empty line
after the initial variable declarations of a function. If a function has no
local variables, there should be no initial empty line. If a long function
consists of several complex steps, those steps might be separated by an empty
line (possibly followed by a comment describing the following step). The code
should not contain empty lines in arbitrary places; if in doubt, it is likely
better to NOT have an empty line (this way, more code will fit on the screen).
@end itemize

@c ***************************************************************************
@node Build-system
@section Build-system

If you have code that is likely not to compile or build rules you might want to
not trigger for most developers, use "if HAVE_EXPERIMENTAL" in your
Makefile.am. Then it is OK to (temporarily) add non-compiling (or
known-to-not-port) code.

If you want to compile all testcases but NOT run them, run configure with the@
@code{--enable-test-suppression} option.

If you want to run all testcases, including those that take a while, run
configure with the@ @code{--enable-expensive-testcases} option.

If you want to compile and run benchmarks, run configure with the@
@code{--enable-benchmarks} option.

If you want to obtain code coverage results, run configure with the@
@code{--enable-coverage} option and run the coverage.sh script in contrib/.

@c ***************************************************************************
@node Developing extensions for GNUnet using the gnunet-ext template
@section Developing extensions for GNUnet using the gnunet-ext template


For developers who want to write extensions for GNUnet we provide the
gnunet-ext template to provide an easy to use skeleton.

gnunet-ext contains the build environment and template files for the
development of GNUnet services, command line tools, APIs and tests.

First of all you have to obtain gnunet-ext from git:

@code{git clone https://gnunet.org/git/gnunet-ext.git}

The next step is to bootstrap and configure it. For configure you have to
provide the path containing GNUnet with @code{--with-gnunet=/path/to/gnunet}
and the prefix where you want the install the extension using
@code{--prefix=/path/to/install}@ @code{@ ./bootstrap@ ./configure
--prefix=/path/to/install --with-gnunet=/path/to/gnunet@ }

When your GNUnet installation is not included in the default linker search
path, you have to add @code{/path/to/gnunet} to the file @code{/etc/ld.so.conf}
and run @code{ldconfig} or your add it to the environmental variable
@code{LD_LIBRARY_PATH} by using

@code{export LD_LIBRARY_PATH=/path/to/gnunet/lib}

@c ***************************************************************************
@node Writing testcases
@section Writing testcases

Ideally, any non-trivial GNUnet code should be covered by automated testcases.
Testcases should reside in the same place as the code that is being tested. The
name of source files implementing tests should begin with "test_" followed by
the name of the file that contains the code that is being tested.

Testcases in GNUnet should be integrated with the autotools build system. This
way, developers and anyone building binary packages will be able to run all
testcases simply by running @code{make check}. The final testcases shipped with
the distribution should output at most some brief progress information and not
display debug messages by default. The success or failure of a testcase must be
indicated by returning zero (success) or non-zero (failure) from the main
method of the testcase. The integration with the autotools is relatively
straightforward and only requires modifications to the @code{Makefile.am} in
the directory containing the testcase. For a testcase testing the code in
@code{foo.c} the @code{Makefile.am} would contain the following lines:
@example
check_PROGRAMS = test_foo TESTS = $(check_PROGRAMS) test_foo_SOURCES =
test_foo.c test_foo_LDADD = $(top_builddir)/src/util/libgnunetutil.la
@end example

Naturally, other libraries used by the testcase may be specified in the
@code{LDADD} directive as necessary.

Often testcases depend on additional input files, such as a configuration file.
These support files have to be listed using the EXTRA_DIST directive in order
to ensure that they are included in the distribution. Example:
@example
EXTRA_DIST = test_foo_data.conf
@end example


Executing @code{make check} will run all testcases in the current directory and
all subdirectories. Testcases can be compiled individually by running
@code{make test_foo} and then invoked directly using @code{./test_foo}. Note
that due to the use of plugins in GNUnet, it is typically necessary to run
@code{make install} before running any testcases. Thus the canonical command
@code{make check install} has to be changed to @code{make install check} for
GNUnet.

@c ***************************************************************************
@node GNUnet's TESTING library
@section GNUnet's TESTING library

The TESTING library is used for writing testcases which involve starting a
single or multiple peers. While peers can also be started by testcases using
the ARM subsystem, using TESTING library provides an elegant way to do this.
The configurations of the peers are auto-generated from a given template to
have non-conflicting port numbers ensuring that peers' services do not run into
bind errors. This is achieved by testing ports' availability by binding a
listening socket to them before allocating them to services in the generated
configurations.

An another advantage while using TESTING is that it shortens the testcase
startup time as the hostkeys for peers are copied from a pre-computed set of
hostkeys instead of generating them at peer startup which may take a
considerable amount of time when starting multiple peers or on an embedded
processor.

TESTING also allows for certain services to be shared among peers. This feature
is invaluable when testing with multiple peers as it helps to reduce the number
of services run per each peer and hence the total number of processes run per
testcase.

TESTING library only handles creating, starting and stopping peers. Features
useful for testcases such as connecting peers in a topology are not available
in TESTING but are available in the TESTBED subsystem. Furthermore, TESTING
only creates peers on the localhost, however by using TESTBED testcases can
benefit from creating peers across multiple hosts.

@menu
* API::
* Finer control over peer stop::
* Helper functions::
* Testing with multiple processes::
@end menu

@c ***************************************************************************
@node API
@subsection API

TESTING abstracts a group of peers as a TESTING system. All peers in a system
have common hostname and no two services of these peers have a same port or a
UNIX domain socket path.

TESTING system can be created with the function
@code{GNUNET_TESTING_system_create()} which returns a handle to the system.
This function takes a directory path which is used for generating the
configurations of peers, an IP address from which connections to the peers'
services should be allowed, the hostname to be used in peers' configuration,
and an array of shared service specifications of type @code{struct
GNUNET_TESTING_SharedService}.

The shared service specification must specify the name of the service to share,
the configuration pertaining to that shared service and the maximum number of
peers that are allowed to share a single instance of the shared service.

TESTING system created with @code{GNUNET_TESTING_system_create()} chooses ports
from the default range 12000 - 56000 while auto-generating configurations for
peers. This range can be customised with the function
@code{GNUNET_TESTING_system_create_with_portrange()}. This function is similar
to @code{GNUNET_TESTING_system_create()} except that it take 2 additional
parameters --- the start and end of the port range to use.

A TESTING system is destroyed with the funciton
@code{GNUNET_TESTING_system_destory()}. This function takes the handle of the
system and a flag to remove the files created in the directory used to generate
configurations.

A peer is created with the function @code{GNUNET_TESTING_peer_configure()}.
This functions takes the system handle, a configuration template from which the
configuration for the peer is auto-generated and the index from where the
hostkey for the peer has to be copied from. When successfull, this function
returs a handle to the peer which can be used to start and stop it and to
obtain the identity of the peer. If unsuccessful, a NULL pointer is returned
with an error message. This function handles the generated configuration to
have non-conflicting ports and paths.

Peers can be started and stopped by calling the functions
@code{GNUNET_TESTING_peer_start()} and @code{GNUNET_TESTING_peer_stop()}
respectively. A peer can be destroyed by calling the function
@code{GNUNET_TESTING_peer_destroy}. When a peer is destroyed, the ports and
paths in allocated in its configuration are reclaimed for usage in new
peers.

@c ***************************************************************************
@node Finer control over peer stop
@subsection Finer control over peer stop

Using @code{GNUNET_TESTING_peer_stop()} is normally fine for testcases.
However, calling this function for each peer is inefficient when trying to
shutdown multiple peers as this function sends the termination signal to the
given peer process and waits for it to terminate. It would be faster in this
case to send the termination signals to the peers first and then wait on them.
This is accomplished by the functions @code{GNUNET_TESTING_peer_kill()} which
sends a termination signal to the peer, and the function
@code{GNUNET_TESTING_peer_wait()} which waits on the peer.

Further finer control can be achieved by choosing to stop a peer asynchronously
with the function @code{GNUNET_TESTING_peer_stop_async()}. This function takes
a callback parameter and a closure for it in addition to the handle to the peer
to stop. The callback function is called with the given closure when the peer
is stopped. Using this function eliminates blocking while waiting for the peer
to terminate.

An asynchronous peer stop can be cancelled by calling the function
@code{GNUNET_TESTING_peer_stop_async_cancel()}. Note that calling this function
does not prevent the peer from terminating if the termination signal has
already been sent to it. It does, however, cancels the callback to be called
when the peer is stopped.

@c ***************************************************************************
@node Helper functions
@subsection Helper functions

Most of the testcases can benefit from an abstraction which configures a peer
and starts it. This is provided by the function
@code{GNUNET_TESTING_peer_run()}. This function takes the testing directory
pathname, a configuration template, a callback and its closure. This function
creates a peer in the given testing directory by using the configuration
template, starts the peer and calls the given callback with the given closure.

The function @code{GNUNET_TESTING_peer_run()} starts the ARM service of the
peer which starts the rest of the configured services. A similar function
@code{GNUNET_TESTING_service_run} can be used to just start a single service of
a peer. In this case, the peer's ARM service is not started; instead, only the
given service is run.

@c ***************************************************************************
@node Testing with multiple processes
@subsection Testing with multiple processes

When testing GNUnet, the splitting of the code into a services and clients
often complicates testing. The solution to this is to have the testcase fork
@code{gnunet-service-arm}, ask it to start the required server and daemon
processes and then execute appropriate client actions (to test the client APIs
or the core module or both). If necessary, multiple ARM services can be forked
using different ports (!) to simulate a network. However, most of the time only
one ARM process is needed. Note that on exit, the testcase should shutdown ARM
with a @code{TERM} signal (to give it the chance to cleanly stop its child
processes).

The following code illustrates spawning and killing an ARM process from a
testcase:
@example
static void run (void *cls, char *const *args, const char
*cfgfile, const struct GNUNET_CONFIGURATION_Handle *cfg) @{ struct
GNUNET_OS_Process *arm_pid; arm_pid = GNUNET_OS_start_process (NULL, NULL,
"gnunet-service-arm", "gnunet-service-arm", "-c", cfgname, NULL);
  /* do real test work here */
  if (0 != GNUNET_OS_process_kill (arm_pid, SIGTERM)) GNUNET_log_strerror
  (GNUNET_ERROR_TYPE_WARNING, "kill"); GNUNET_assert (GNUNET_OK ==
  GNUNET_OS_process_wait (arm_pid)); GNUNET_OS_process_close (arm_pid); @}

GNUNET_PROGRAM_run (argc, argv, "NAME-OF-TEST", "nohelp", options, &run, cls);
@end example


An alternative way that works well to test plugins is to implement a
mock-version of the environment that the plugin expects and then to simply load
the plugin directly.

@c ***************************************************************************
@node Performance regression analysis with Gauger
@section Performance regression analysis with Gauger

To help avoid performance regressions, GNUnet uses Gauger. Gauger is a simple
logging tool that allows remote hosts to send performance data to a central
server, where this data can be analyzed and visualized. Gauger shows graphs of
the repository revisions and the performace data recorded for each revision, so
sudden performance peaks or drops can be identified and linked to a specific
revision number.

In the case of GNUnet, the buildbots log the performance data obtained during
the tests after each build. The data can be accesed on GNUnet's Gauger page.

The menu on the left allows to select either the results of just one build bot
(under "Hosts") or review the data from all hosts for a given test result
(under "Metrics"). In case of very different absolute value of the results, for
instance arm vs. amd64 machines, the option "Normalize" on a metric view can
help to get an idea about the performance evolution across all hosts.

Using Gauger in GNUnet and having the performance of a module tracked over time
is very easy. First of course, the testcase must generate some consistent
metric, which makes sense to have logged. Highly volatile or random dependant
metrics probably are not ideal candidates for meaningful regression detection.

To start logging any value, just include @code{gauger.h} in your testcase code.
Then, use the macro @code{GAUGER()} to make the buildbots log whatever value is
of interest for you to @code{gnunet.org}'s Gauger server. No setup is necessary
as most buildbots have already everything in place and new metrics are created
on demand. To delete a metric, you need to contact a member of the GNUnet
development team (a file will need to be removed manually from the respective
directory).

The code in the test should look like this:
@example
[other includes]
#include <gauger.h>

int main (int argc, char *argv[]) @{

  [run test, generate data] GAUGER("YOUR_MODULE", "METRIC_NAME", (float)value,
  "UNIT"); @}
@end example


Where:
@table @asis

@item @strong{YOUR_MODULE} is a category in the gauger page and should be the
name of the module or subsystem like "Core" or "DHT"
@item @strong{METRIC} is
the name of the metric being collected and should be concise and descriptive,
like "PUT operations in sqlite-datastore".
@item @strong{value} is the value
of the metric that is logged for this run.
@item @strong{UNIT} is the unit in
which the value is measured, for instance "kb/s" or "kb of RAM/node".
@end table

If you wish to use Gauger for your own project, you can grab a copy of the
latest stable release or check out Gauger's Subversion repository.

@c ***************************************************************************
@node GNUnet's TESTBED Subsystem
@section GNUnet's TESTBED Subsystem

The TESTBED subsystem facilitates testing and measuring of multi-peer
deployments on a single host or over multiple hosts.

The architecture of the testbed module is divided into the following:
@itemize @bullet

@item Testbed API: An API which is used by the testing driver programs. It
provides with functions for creating, destroying, starting, stopping peers,
etc.

@item Testbed service (controller): A service which is started through the
Testbed API. This service handles operations to create, destroy, start, stop
peers, connect them, modify their configurations.

@item Testbed helper: When a controller has to be started on a host, the
testbed API starts the testbed helper on that host which in turn starts the
controller. The testbed helper receives a configuration for the controller
through its stdin and changes it to ensure the controller doesn't run into any
port conflict on that host.
@end itemize


The testbed service (controller) is different from the other GNUnet services in
that it is not started by ARM and is not supposed to be run as a daemon. It is
started by the testbed API through a testbed helper. In a typical scenario
involving multiple hosts, a controller is started on each host. Controllers
take up the actual task of creating peers, starting and stopping them on the
hosts they run.

While running deployments on a single localhost the testbed API starts the
testbed helper directly as a child process. When running deployments on remote
hosts the testbed API starts Testbed Helpers on each remote host through remote
shell. By default testbed API uses SSH as a remote shell. This can be changed
by setting the environmental variable GNUNET_TESTBED_RSH_CMD to the required
remote shell program. This variable can also contain parameters which are to be
passed to the remote shell program. For e.g:@ @code{@ export
GNUNET_TESTBED_RSH_CMD="ssh -o BatchMode=yes -o
NoHostAuthenticationForLocalhost=yes %h"@ }@ Substitutions are allowed int the
above command string also allows for substitions. through placemarks which
begin with a `%'. At present the following substitutions are supported
@itemize @bullet
@item
%h: hostname
@item
%u: username
@item
%p: port
@end itemize

Note that the substitution placemark is replaced only when the corresponding
field is available and only once. Specifying @code{%u@@%h} doesn't work either.
If you want to user username substitutions for SSH use the argument @code{-l}
before the username substitution. Ex: @code{ssh -l %u -p %p %h}

The testbed API and the helper communicate through the helpers stdin and
stdout. As the helper is started through a remote shell on remote hosts any
output messages from the remote shell interfere with the communication and
results in a failure while starting the helper. For this reason, it is
suggested to use flags to make the remote shells produce no output messages and
to have password-less logins. The default remote shell, SSH, the default
options are "-o BatchMode=yes -o NoHostBasedAuthenticationForLocalhost=yes".
Password-less logins should be ensured by using SSH keys.

Since the testbed API executes the remote shell as a non-interactive shell,
certain scripts like .bashrc, .profiler may not be executed. If this is the
case testbed API can be forced to execute an interactive shell by setting up
the environmental variable `GNUNET_TESTBED_RSH_CMD_SUFFIX' to a shell program.
An example could be:@ @code{@ export GNUNET_TESTBED_RSH_CMD_SUFFIX="sh -lc"@ }@
The testbed API will then execute the remote shell program as: @code{
$GNUNET_TESTBED_RSH_CMD -p $port $dest $GNUNET_TESTBED_RSH_CMD_SUFFIX
gnunet-helper-testbed }

On some systems, problems may arise while starting testbed helpers if GNUnet is
installed into a custom location since the helper may not be found in the
standard path. This can be addressed by setting the variable
`HELPER_BINARY_PATH' to the path of the testbed helper. Testbed API will then
use this path to start helper binaries both locally and remotely.

Testbed API can accessed by including "gnunet_testbed_service.h" file and
linking with -lgnunettestbed.



@c ***************************************************************************
@menu
* Supported Topologies::
* Hosts file format::
* Topology file format::
* Testbed Barriers::
* Automatic large-scale deployment of GNUnet in the PlanetLab testbed::
* TESTBED Caveats::
@end menu

@node Supported Topologies
@subsection Supported Topologies

While testing multi-peer deployments, it is often needed that the peers are
connected in some topology. This requirement is addressed by the function
@code{GNUNET_TESTBED_overlay_connect()} which connects any given two peers in
the testbed.

The API also provides a helper function
@code{GNUNET_TESTBED_overlay_configure_topology()} to connect a given set of
peers in any of the following supported topologies:
@itemize @bullet

@item @code{GNUNET_TESTBED_TOPOLOGY_CLIQUE}: All peers are connected with each
other

@item @code{GNUNET_TESTBED_TOPOLOGY_LINE}: Peers are connected to form a line

@item @code{GNUNET_TESTBED_TOPOLOGY_RING}: Peers are connected to form a ring
topology

@item @code{GNUNET_TESTBED_TOPOLOGY_2D_TORUS}: Peers are connected to form a 2
dimensional torus topology. The number of peers may not be a perfect square, in
that case the resulting torus may not have the uniform poloidal and toroidal
lengths

@item @code{GNUNET_TESTBED_TOPOLOGY_ERDOS_RENYI}: Topology is generated to form
a random graph. The number of links to be present should be given

@item @code{GNUNET_TESTBED_TOPOLOGY_SMALL_WORLD}: Peers are connected to form a
2D Torus with some random links among them. The number of random links are to
be given

@item @code{GNUNET_TESTBED_TOPOLOGY_SMALL_WORLD_RING}: Peers are connected to
form a ring with some random links among them. The number of random links are
to be given

@item @code{GNUNET_TESTBED_TOPOLOGY_SCALE_FREE}: Connects peers in a topology
where peer connectivity follows power law - new peers are connected with high
probabililty to well connected peers. See Emergence of Scaling in Random
Networks. Science 286, 509-512, 1999.

@item @code{GNUNET_TESTBED_TOPOLOGY_FROM_FILE}: The topology information is
loaded from a file. The path to the file has to be given. See Topology file
format for the format of this file.

@item @code{GNUNET_TESTBED_TOPOLOGY_NONE}: No topology
@end itemize


The above supported topologies can be specified respectively by setting the
variable @code{OVERLAY_TOPOLOGY} to the following values in the configuration
passed to Testbed API functions @code{GNUNET_TESTBED_test_run()} and
@code{GNUNET_TESTBED_run()}:
@itemize @bullet
@item @code{CLIQUE}
@item @code{RING}
@item @code{LINE}
@item @code{2D_TORUS}
@item @code{RANDOM}
@item @code{SMALL_WORLD}
@item @code{SMALL_WORLD_RING}
@item @code{SCALE_FREE}
@item @code{FROM_FILE}
@item @code{NONE}
@end itemize


Topologies @code{RANDOM}, @code{SMALL_WORLD} and @code{SMALL_WORLD_RING}
require the option @code{OVERLAY_RANDOM_LINKS} to be set to the number of
random links to be generated in the configuration. The option will be ignored
for the rest of the topologies.

Topology @code{SCALE_FREE} requires the options @code{SCALE_FREE_TOPOLOGY_CAP}
to be set to the maximum number of peers which can connect to a peer and
@code{SCALE_FREE_TOPOLOGY_M} to be set to how many peers a peer should be
atleast connected to.

Similarly, the topology @code{FROM_FILE} requires the option
@code{OVERLAY_TOPOLOGY_FILE} to contain the path of the file containing the
topology information. This option is ignored for the rest of the topologies.
See Topology file format for the format of this file.

@c ***************************************************************************
@node Hosts file format
@subsection Hosts file format

The testbed API offers the function GNUNET_TESTBED_hosts_load_from_file() to
load from a given file details about the hosts which testbed can use for
deploying peers. This function is useful to keep the data about hosts separate
instead of hard coding them in code.

Another helper function from testbed API, GNUNET_TESTBED_run() also takes a
hosts file name as its parameter. It uses the above function to populate the
hosts data structures and start controllers to deploy peers.

These functions require the hosts file to be of the following format:
@itemize @bullet
@item Each line is interpreted to have details about a host
@item Host details should include the username to use for logging into the
host, the hostname of the host and the port number to use for the remote shell
program. All thee values should be given.
@item These details should be given in the following format:
@code{<username>@@<hostname>:<port>}
@end itemize

Note that having canonical hostnames may cause problems while resolving the IP
addresses (See this bug). Hence it is advised to provide the hosts' IP
numerical addresses as hostnames whenever possible.

@c ***************************************************************************
@node Topology file format
@subsection Topology file format

A topology file describes how peers are to be connected. It should adhere to
the following format for testbed to parse it correctly.

Each line should begin with the target peer id. This should be followed by a
colon(`:') and origin peer ids seperated by `|'. All spaces except for newline
characters are ignored. The API will then try to connect each origin peer to
the target peer.

For example, the following file will result in 5 overlay connections: [2->1],
[3->1],[4->3], [0->3], [2->0]@ @code{@ 1:2|3@ 3:4| 0@ 0: 2@ }

@c ***************************************************************************
@node Testbed Barriers
@subsection Testbed Barriers

The testbed subsystem's barriers API facilitates coordination among the peers
run by the testbed and the experiment driver. The concept is similar to the
barrier synchronisation mechanism found in parallel programming or
multi-threading paradigms - a peer waits at a barrier upon reaching it until
the barrier is reached by a predefined number of peers. This predefined number
of peers required to cross a barrier is also called quorum. We say a peer has
reached a barrier if the peer is waiting for the barrier to be crossed.
Similarly a barrier is said to be reached if the required quorum of peers reach
the barrier. A barrier which is reached is deemed as crossed after all the
peers waiting on it are notified.

The barriers API provides the following functions:
@itemize @bullet
@item @strong{@code{GNUNET_TESTBED_barrier_init()}:} function to initialse a
barrier in the experiment
@item @strong{@code{GNUNET_TESTBED_barrier_cancel()}:} function to cancel a
barrier which has been initialised before
@item @strong{@code{GNUNET_TESTBED_barrier_wait()}:} function to signal barrier
service that the caller has reached a barrier and is waiting for it to be
crossed
@item @strong{@code{GNUNET_TESTBED_barrier_wait_cancel()}:} function to stop
waiting for a barrier to be crossed
@end itemize


Among the above functions, the first two, namely
@code{GNUNET_TESTBED_barrier_init()} and @code{GNUNET_TESTBED_barrier_cancel()}
are used by experiment drivers. All barriers should be initialised by the
experiment driver by calling @code{GNUNET_TESTBED_barrier_init()}. This
function takes a name to identify the barrier, the quorum required for the
barrier to be crossed and a notification callback for notifying the experiment
driver when the barrier is crossed. @code{GNUNET_TESTBED_barrier_cancel()}
cancels an initialised barrier and frees the resources allocated for it. This
function can be called upon a initialised barrier before it is crossed.

The remaining two functions @code{GNUNET_TESTBED_barrier_wait()} and
@code{GNUNET_TESTBED_barrier_wait_cancel()} are used in the peer's processes.
@code{GNUNET_TESTBED_barrier_wait()} connects to the local barrier service
running on the same host the peer is running on and registers that the caller
has reached the barrier and is waiting for the barrier to be crossed. Note that
this function can only be used by peers which are started by testbed as this
function tries to access the local barrier service which is part of the testbed
controller service. Calling @code{GNUNET_TESTBED_barrier_wait()} on an
uninitialised barrier results in failure.
@code{GNUNET_TESTBED_barrier_wait_cancel()} cancels the notification registered
by @code{GNUNET_TESTBED_barrier_wait()}.


@c ***************************************************************************
@menu
* Implementation::
@end menu

@node Implementation
@subsubsection Implementation

Since barriers involve coordination between experiment driver and peers, the
barrier service in the testbed controller is split into two components. The
first component responds to the message generated by the barrier API used by
the experiment driver (functions @code{GNUNET_TESTBED_barrier_init()} and
@code{GNUNET_TESTBED_barrier_cancel()}) and the second component to the
messages generated by barrier API used by peers (functions
@code{GNUNET_TESTBED_barrier_wait()} and
@code{GNUNET_TESTBED_barrier_wait_cancel()}).

Calling @code{GNUNET_TESTBED_barrier_init()} sends a
@code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_INIT} message to the master
controller. The master controller then registers a barrier and calls
@code{GNUNET_TESTBED_barrier_init()} for each its subcontrollers. In this way
barrier initialisation is propagated to the controller hierarchy. While
propagating initialisation, any errors at a subcontroller such as timeout
during further propagation are reported up the hierarchy back to the experiment
driver.

Similar to @code{GNUNET_TESTBED_barrier_init()},
@code{GNUNET_TESTBED_barrier_cancel()} propagates
@code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_CANCEL} message which causes
controllers to remove an initialised barrier.

The second component is implemented as a separate service in the binary
`gnunet-service-testbed' which already has the testbed controller service.
Although this deviates from the gnunet process architecture of having one
service per binary, it is needed in this case as this component needs access to
barrier data created by the first component. This component responds to
@code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_WAIT} messages from local peers when
they call @code{GNUNET_TESTBED_barrier_wait()}. Upon receiving
@code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_WAIT} message, the service checks if
the requested barrier has been initialised before and if it was not
initialised, an error status is sent through
@code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} message to the local peer and
the connection from the peer is terminated. If the barrier is initialised
before, the barrier's counter for reached peers is incremented and a
notification is registered to notify the peer when the barrier is reached. The
connection from the peer is left open.

When enough peers required to attain the quorum send
@code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_WAIT} messages, the controller sends
a @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} message to its parent
informing that the barrier is crossed. If the controller has started further
subcontrollers, it delays this message until it receives a similar notification
from each of those subcontrollers. Finally, the barriers API at the experiment
driver receives the @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} when the
barrier is reached at all the controllers.

The barriers API at the experiment driver responds to the
@code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} message by echoing it back to
the master controller and notifying the experiment controller through the
notification callback that a barrier has been crossed. The echoed
@code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} message is propagated by the
master controller to the controller hierarchy. This propagation triggers the
notifications registered by peers at each of the controllers in the hierarchy.
Note the difference between this downward propagation of the
@code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} message from its upward
propagation --- the upward propagation is needed for ensuring that the barrier
is reached by all the controllers and the downward propagation is for
triggering that the barrier is crossed.

@c ***************************************************************************
@node Automatic large-scale deployment of GNUnet in the PlanetLab testbed
@subsection Automatic large-scale deployment of GNUnet in the PlanetLab testbed

PlanetLab is as a testbed for computer networking and distributed systems
research. It was established in 2002 and as of June 2010 was composed of 1090
nodes at 507 sites worldwide.

To automate the GNUnet we created a set of automation tools to simplify the
large-scale deployment. We provide you a set of scripts you can use to deploy
GNUnet on a set of nodes and manage your installation.

Please also check @uref{https://gnunet.org/installation-fedora8-svn} and@
@uref{https://gnunet.org/installation-fedora12-svn} to find detailled
instructions how to install GNUnet on a PlanetLab node.


@c ***************************************************************************
@menu
* PlanetLab Automation for Fedora8 nodes::
* Install buildslave on PlanetLab nodes running fedora core 8::
* Setup a new PlanetLab testbed using GPLMT::
* Why do i get an ssh error when using the regex profiler?::
@end menu

@node PlanetLab Automation for Fedora8 nodes
@subsubsection PlanetLab Automation for Fedora8 nodes

@c ***************************************************************************
@node Install buildslave on PlanetLab nodes running fedora core 8
@subsubsection Install buildslave on PlanetLab nodes running fedora core 8
@c ** Actually this is a subsubsubsection, but must be fixed differently
@c ** as subsubsection is the lowest.

Since most of the PlanetLab nodes are running the very old fedora core 8 image,
installing the buildslave software is quite some pain. For our PlanetLab
testbed we figured out how to install the buildslave software best.

Install Distribute for python:@ @code{@ curl
http://python-distribute.org/distribute_setup.py | sudo python@ }

Install Distribute for zope.interface <= 3.8.0 (4.0 and 4.0.1 will not work):@
@code{@ wget
http://pypi.python.org/packages/source/z/zope.interface/zope.interface-3.8.0.tar.gz@
tar zvfz zope.interface-3.8.0.tar.gz@ cd zope.interface-3.8.0@ sudo python
setup.py install@ }

Install the buildslave software (0.8.6 was the latest version):@ @code{@ wget
http://buildbot.googlecode.com/files/buildbot-slave-0.8.6p1.tar.gz@ tar xvfz
buildbot-slave-0.8.6p1.tar.gz@ cd buildslave-0.8.6p1@ sudo python setup.py
install@ }

The setup will download the matching twisted package and install it.@ It will
also try to install the latest version of zope.interface which will fail to
install. Buildslave will work anyway since version 3.8.0 was installed before!

@c ***************************************************************************
@node Setup a new PlanetLab testbed using GPLMT
@subsubsection Setup a new PlanetLab testbed using GPLMT

@itemize @bullet
@item Get a new slice and assign nodes
Ask your PlanetLab PI to give you a new slice and assign the nodes you need
@item Install a buildmaster
You can stick to the buildbot documentation:@
@uref{http://buildbot.net/buildbot/docs/current/manual/installation.html}
@item Install the buildslave software on all nodes
To install the buildslave on all nodes assigned to your slice you can use the
tasklist @code{install_buildslave_fc8.xml} provided with GPLMT:

@code{@ ./gplmt.py -c contrib/tumple_gnunet.conf -t
contrib/tasklists/install_buildslave_fc8.xml -a -p <planetlab password>@ }

@item Create the buildmaster configuration and the slave setup commands

The master and the and the slaves have need to have credentials and the master
has to have all nodes configured. This can be done with the
@code{create_buildbot_configuration.py} script in the @code{scripts} directory

This scripts takes a list of nodes retrieved directly from PlanetLab or read
from a file and a configuration template and creates:@
 - a tasklist which can be executed with gplmt to setup the slaves@
 - a master.cfg file containing a PlanetLab nodes

A configuration template is included in the <contrib>, most important is that
the script replaces the following tags in the template:

%GPLMT_BUILDER_DEFINITION :@ GPLMT_BUILDER_SUMMARY@ GPLMT_SLAVES@
%GPLMT_SCHEDULER_BUILDERS

Create configuration for all nodes assigned to a slice:@ @code{@
./create_buildbot_configuration.py -u <planetlab username> -p <planetlab
password> -s <slice> -m <buildmaster+port> -t <template>@ }@ Create
configuration for some nodes in a file:@ @code{@
./create_buildbot_configuration.p -f <node_file> -m <buildmaster+port> -t
<template>@ }

@item Copy the @code{master.cfg} to the buildmaster and start it
Use @code{buildbot start <basedir>} to start the server
@item Setup the buildslaves
@end itemize

@c ***************************************************************************
@node Why do i get an ssh error when using the regex profiler?
@subsubsection Why do i get an ssh error when using the regex profiler?

Why do i get an ssh error "Permission denied (publickey,password)." when using
the regex profiler although passwordless ssh to localhost works using publickey
and ssh-agent?

You have to generate a public/private-key pair with no password:@
@code{ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_localhost}@
and then add the following to your ~/.ssh/config file:

@code{Host 127.0.0.1@ IdentityFile ~/.ssh/id_localhost}

now make sure your hostsfile looks like@

[USERNAME]@@127.0.0.1:22@
[USERNAME]@@127.0.0.1:22

You can test your setup by running `ssh 127.0.0.1` in a terminal and then in
the opened session run it again. If you were not asked for a password on either
login, then you should be good to go.

@c ***************************************************************************
@node TESTBED Caveats
@subsection TESTBED Caveats

This section documents a few caveats when using the GNUnet testbed
subsystem.


@c ***************************************************************************
@menu
* CORE must be started::
* ATS must want the connections::
@end menu

@node CORE must be started
@subsubsection CORE must be started

A simple issue is #3993: Your configuration MUST somehow ensure that for each
peer the CORE service is started when the peer is setup, otherwise TESTBED may
fail to connect peers when the topology is initialized, as TESTBED will start
some CORE services but not necessarily all (but it relies on all of them
running). The easiest way is to set 'FORCESTART = YES' in the '[core]' section
of the configuration file. Alternatively, having any service that directly or
indirectly depends on CORE being started with FORCESTART will also do. This
issue largely arises if users try to over-optimize by not starting any services
with FORCESTART.

@c ***************************************************************************
@node ATS must want the connections
@subsubsection ATS must want the connections

When TESTBED sets up connections, it only offers the respective HELLO
information to the TRANSPORT service. It is then up to the ATS service to
@strong{decide} to use the connection. The ATS service will typically eagerly
establish any connection if the number of total connections is low (relative to
bandwidth). Details may further depend on the specific ATS backend that was
configured. If ATS decides to NOT establish a connection (even though TESTBED
provided the required information), then that connection will count as failed
for TESTBED. Note that you can configure TESTBED to tolerate a certain number
of connection failures (see '-e' option of gnunet-testbed-profiler). This issue
largely arises for dense overlay topologies, especially if you try to create
cliques with more than 20 peers.

@c ***************************************************************************
@node libgnunetutil
@section libgnunetutil

libgnunetutil is the fundamental library that all GNUnet code builds upon.
Ideally, this library should contain most of the platform dependent code
(except for user interfaces and really special needs that only few applications
have). It is also supposed to offer basic services that most if not all GNUnet
binaries require. The code of libgnunetutil is in the src/util/ directory. The
public interface to the library is in the gnunet_util.h header. The functions
provided by libgnunetutil fall roughly into the following categories (in
roughly the order of importance for new developers):
@itemize @bullet
@item logging (common_logging.c)
@item memory allocation (common_allocation.c)
@item endianess conversion (common_endian.c)
@item internationalization (common_gettext.c)
@item String manipulation (string.c)
@item file access (disk.c)
@item buffered disk IO (bio.c)
@item time manipulation (time.c)
@item configuration parsing (configuration.c)
@item command-line handling (getopt*.c)
@item cryptography (crypto_*.c)
@item data structures (container_*.c)
@item CPS-style scheduling (scheduler.c)
@item Program initialization (program.c)
@item Networking (network.c, client.c, server*.c, service.c)
@item message queueing (mq.c)
@item bandwidth calculations (bandwidth.c)
@item Other OS-related (os*.c, plugin.c, signal.c)
@item Pseudonym management (pseudonym.c)
@end itemize

It should be noted that only developers that fully understand this entire API
will be able to write good GNUnet code.

Ideally, porting GNUnet should only require porting the gnunetutil library.
More testcases for the gnunetutil APIs are therefore a great way to make
porting of GNUnet easier.

@menu
* Logging::
* Interprocess communication API (IPC)::
* Cryptography API::
* Message Queue API::
* Service API::
* Optimizing Memory Consumption of GNUnet's (Multi-) Hash Maps::
* The CONTAINER_MDLL API::
@end menu

@c ***************************************************************************
@node Logging
@subsection Logging

GNUnet is able to log its activity, mostly for the purposes of debugging the
program at various levels.

@file{gnunet_common.h} defines several @strong{log levels}:
@table @asis

@item ERROR for errors (really problematic situations, often leading to
crashes)
@item WARNING for warnings (troubling situations that might have
negative consequences, although not fatal)
@item INFO for various information.
Used somewhat rarely, as GNUnet statistics is used to hold and display most of
the information that users might find interesting.
@item DEBUG for debugging.
Does not produce much output on normal builds, but when extra logging is
enabled at compile time, a staggering amount of data is outputted under this
log level.
@end table


Normal builds of GNUnet (configured with @code{--enable-logging[=yes]}) are
supposed to log nothing under DEBUG level. The @code{--enable-logging=verbose}
configure option can be used to create a build with all logging enabled.
However, such build will produce large amounts of log data, which is
inconvenient when one tries to hunt down a specific problem.

To mitigate this problem, GNUnet provides facilities to apply a filter to
reduce the logs:
@table @asis

@item Logging by default When no log levels are configured in any other way
(see below), GNUnet will default to the WARNING log level. This mostly applies
to GNUnet command line utilities, services and daemons; tests will always set
log level to WARNING or, if @code{--enable-logging=verbose} was passed to
configure, to DEBUG. The default level is suggested for normal operation.
@item The -L option Most GNUnet executables accept an "-L loglevel" or
"--log=loglevel" option. If used, it makes the process set a global log level
to "loglevel". Thus it is possible to run some processes with -L DEBUG, for
example, and others with -L ERROR to enable specific settings to diagnose
problems with a particular process.
@item Configuration files.  Because GNUnet
service and deamon processes are usually launched by gnunet-arm, it is not
possible to pass different custom command line options directly to every one of
them. The options passed to @code{gnunet-arm} only affect gnunet-arm and not
the rest of GNUnet. However, one can specify a configuration key "OPTIONS" in
the section that corresponds to a service or a daemon, and put a value of "-L
loglevel" there. This will make the respective service or daemon set its log
level to "loglevel" (as the value of OPTIONS will be passed as a command-line
argument).

To specify the same log level for all services without creating separate
"OPTIONS" entries in the configuration for each one, the user can specify a
config key "GLOBAL_POSTFIX" in the [arm] section of the configuration file. The
value of GLOBAL_POSTFIX will be appended to all command lines used by the ARM
service to run other services. It can contain any option valid for all GNUnet
commands, thus in particular the "-L loglevel" option. The ARM service itself
is, however, unaffected by GLOBAL_POSTFIX; to set log level for it, one has to
specify "OPTIONS" key in the [arm] section.
@item Environment variables.
Setting global per-process log levels with "-L loglevel" does not offer
sufficient log filtering granularity, as one service will call interface
libraries and supporting libraries of other GNUnet services, potentially
producing lots of debug log messages from these libraries. Also, changing the
config file is not always convenient (especially when running the GNUnet test
suite).@ To fix that, and to allow GNUnet to use different log filtering at
runtime without re-compiling the whole source tree, the log calls were changed
to be configurable at run time. To configure them one has to define environment
variables "GNUNET_FORCE_LOGFILE", "GNUNET_LOG" and/or "GNUNET_FORCE_LOG":
@itemize @bullet

@item "GNUNET_LOG" only affects the logging when no global log level is
configured by any other means (that is, the process does not explicitly set its
own log level, there are no "-L loglevel" options on command line or in
configuration files), and can be used to override the default WARNING log
level.

@item "GNUNET_FORCE_LOG" will completely override any other log configuration
options given.

@item "GNUNET_FORCE_LOGFILE" will completely override the location of the file
to log messages to. It should contain a relative or absolute file name. Setting
GNUNET_FORCE_LOGFILE is equivalent to passing "--log-file=logfile" or "-l
logfile" option (see below). It supports "[]" format in file names, but not
"@{@}" (see below).
@end itemize


Because environment variables are inherited by child processes when they are
launched, starting or re-starting the ARM service with these variables will
propagate them to all other services.

"GNUNET_LOG" and "GNUNET_FORCE_LOG" variables must contain a specially
formatted @strong{logging definition} string, which looks like this:@ @code{@
[component];[file];[function];[from_line[-to_line]];loglevel@emph{[/component...]}@
}@ That is, a logging definition consists of definition entries, separated by
slashes ('/'). If only one entry is present, there is no need to add a slash
to its end (although it is not forbidden either).@ All definition fields
(component, file, function, lines and loglevel) are mandatory, but (except for
the loglevel) they can be empty. An empty field means "match anything". Note
that even if fields are empty, the semicolon (';') separators must be
present.@ The loglevel field is mandatory, and must contain one of the log
level names (ERROR, WARNING, INFO or DEBUG).@ The lines field might contain
one non-negative number, in which case it matches only one line, or a range
"from_line-to_line", in which case it matches any line in the interval
[from_line;to_line] (that is, including both start and end line).@ GNUnet
mostly defaults component name to the name of the service that is implemented
in a process ('transport', 'core', 'peerinfo', etc), but logging calls can
specify custom component names using @code{GNUNET_log_from}.@ File name and
function name are provided by the compiler (__FILE__ and __FUNCTION__
built-ins).

Component, file and function fields are interpreted as non-extended regular
expressions (GNU libc regex functions are used). Matching is case-sensitive, ^
and $ will match the beginning and the end of the text. If a field is empty,
its contents are automatically replaced with a ".*" regular expression, which
matches anything. Matching is done in the default way, which means that the
expression matches as long as it's contained anywhere in the string. Thus
"GNUNET_" will match both "GNUNET_foo" and "BAR_GNUNET_BAZ". Use '^' and/or '$'
to make sure that the expression matches at the start and/or at the end of the
string.@ The semicolon (';') can't be escaped, and GNUnet will not use it in
component names (it can't be used in function names and file names anyway).@

@end table


Every logging call in GNUnet code will be (at run time) matched against the
log definitions passed to the process. If a log definition fields are matching
the call arguments, then the call log level is compared the the log level of
that definition. If the call log level is less or equal to the definition log
level, the call is allowed to proceed. Otherwise the logging call is
forbidden, and nothing is logged. If no definitions matched at all, GNUnet
will use the global log level or (if a global log level is not specified) will
default to WARNING (that is, it will allow the call to proceed, if its level
is less or equal to the global log level or to WARNING).

That is, definitions are evaluated from left to right, and the first matching
definition is used to allow or deny the logging call. Thus it is advised to
place narrow definitions at the beginning of the logdef string, and generic
definitions - at the end.

Whether a call is allowed or not is only decided the first time this particular
call is made. The evaluation result is then cached, so that any attempts to
make the same call later will be allowed or disallowed right away. Because of
that runtime log level evaluation should not significantly affect the process
performance.@ Log definition parsing is only done once, at the first call to
GNUNET_log_setup () made by the process (which is usually done soon after it
starts).

At the moment of writing there is no way to specify logging definitions from
configuration files, only via environment variables.

At the moment GNUnet will stop processing a log definition when it encounters
an error in definition formatting or an error in regular expression syntax, and
will not report the failure in any way.


@c ***************************************************************************
@menu
* Examples::
* Log files::
* Updated behavior of GNUNET_log::
@end menu

@node Examples
@subsubsection Examples

@table @asis

@item @code{GNUNET_FORCE_LOG=";;;;DEBUG" gnunet-arm -s} Start GNUnet process
tree, running all processes with DEBUG level (one should be careful with it, as
log files will grow at alarming rate!)
@item @code{GNUNET_FORCE_LOG="core;;;;DEBUG" gnunet-arm -s} Start GNUnet process
tree, running the core service under DEBUG level (everything else will use
configured or default level).
@item @code{GNUNET_FORCE_LOG=";gnunet-service-transport_validation.c;;;DEBUG" gnunet-arm -s}
Start GNUnet process tree, allowing any logging calls from
gnunet-service-transport_validation.c (everything else will use configured or
default level).
@item @code{GNUNET_FORCE_LOG="fs;gnunet-service-fs_push.c;;;DEBUG" gnunet-arm -s}
Start GNUnet process tree, allowing any logging calls from
gnunet-gnunet-service-fs_push.c (everything else will use configured or default
level).
@item @code{GNUNET_FORCE_LOG=";;GNUNET_NETWORK_socket_select;;DEBUG" gnunet-arm -s}
Start GNUnet process tree, allowing any logging calls from the
GNUNET_NETWORK_socket_select function (everything else will use configured or
default level).
@item @code{GNUNET_FORCE_LOG="transport.*;;.*send.*;;DEBUG/;;;;WARNING" gnunet-arm -s}
Start GNUnet process tree, allowing any logging calls from the components
that have "transport" in their names, and are made from function that have
"send" in their names. Everything else will be allowed to be logged only if it
has WARNING level.
@end table


On Windows, one can use batch files to run GNUnet processes with special
environment variables, without affecting the whole system. Such batch file will
look like this:@ @code{@ set GNUNET_FORCE_LOG=;;do_transmit;;DEBUG@ gnunet-arm
-s@ }@ (note the absence of double quotes in the environment variable
definition, as opposed to earlier examples, which use the shell).@ Another
limitation, on Windows, GNUNET_FORCE_LOGFILE @strong{MUST} be set in order to
GNUNET_FORCE_LOG to work.


@c ***************************************************************************
@node Log files
@subsubsection Log files

GNUnet can be told to log everything into a file instead of stderr (which is
the default) using the "--log-file=logfile" or "-l logfile" option. This option
can also be passed via command line, or from the "OPTION" and "GLOBAL_POSTFIX"
configuration keys (see above). The file name passed with this option is
subject to GNUnet filename expansion. If specified in "GLOBAL_POSTFIX", it is
also subject to ARM service filename expansion, in particular, it may contain
"@{@}" (left and right curly brace) sequence, which will be replaced by ARM
with the name of the service. This is used to keep logs from more than one
service separate, while only specifying one template containing "@{@}" in
GLOBAL_POSTFIX.

As part of a secondary file name expansion, the first occurrence of "[]"
sequence ("left square brace" followed by "right square brace") in the file
name will be replaced with a process identifier or the process when it
initializes its logging subsystem. As a result, all processes will log into
different files. This is convenient for isolating messages of a particular
process, and prevents I/O races when multiple processes try to write into the
file at the same time. This expansion is done independently of "@{@}"
expansion that ARM service does (see above).

The log file name that is specified via "-l" can contain format characters
from the 'strftime' function family. For example, "%Y" will be replaced with
the current year. Using "basename-%Y-%m-%d.log" would include the current
year, month and day in the log file. If a GNUnet process runs for long enough
to need more than one log file, it will eventually clean up old log files.
Currently, only the last three log files (plus the current log file) are
preserved. So once the fifth log file goes into use (so after 4 days if you
use "%Y-%m-%d" as above), the first log file will be automatically deleted.
Note that if your log file name only contains "%Y", then log files would be
kept for 4 years and the logs from the first year would be deleted once year 5
begins. If you do not use any date-related string format codes, logs would
never be automatically deleted by GNUnet.


@c ***************************************************************************

@node Updated behavior of GNUNET_log
@subsubsection Updated behavior of GNUNET_log

It's currently quite common to see constructions like this all over the code:
@example
#if MESH_DEBUG GNUNET_log (GNUNET_ERROR_TYPE_DEBUG, "MESH: client
disconnected\n"); #endif
@end example

The reason for the #if is not to avoid displaying the message when disabled
(GNUNET_ERROR_TYPE takes care of that), but to avoid the compiler including it
in the binary at all, when compiling GNUnet for platforms with restricted
storage space / memory (MIPS routers, ARM plug computers / dev boards, etc).

This presents several problems: the code gets ugly, hard to write and it is
very easy to forget to include the #if guards, creating non-consistent code. A
new change in GNUNET_log aims to solve these problems.

@strong{This change requires to @code{./configure} with at least
@code{--enable-logging=verbose} to see debug messages.}

Here is an example of code with dense debug statements:
@example
switch (restrict_topology) @{
case GNUNET_TESTING_TOPOLOGY_CLIQUE: #if VERBOSE_TESTING
GNUNET_log (GNUNET_ERROR_TYPE_DEBUG, _("Blacklisting all but clique
topology\n")); #endif unblacklisted_connections = create_clique (pg,
&remove_connections, BLACKLIST, GNUNET_NO); break; case
GNUNET_TESTING_TOPOLOGY_SMALL_WORLD_RING: #if VERBOSE_TESTING GNUNET_log
(GNUNET_ERROR_TYPE_DEBUG, _("Blacklisting all but small world (ring)
topology\n")); #endif unblacklisted_connections = create_small_world_ring (pg,
&remove_connections, BLACKLIST); break;
@end example


Pretty hard to follow, huh?

From now on, it is not necessary to include the #if / #endif statements to
acheive the same behavior. The GNUNET_log and GNUNET_log_from macros take care
of it for you, depending on the configure option:
@itemize @bullet
@item If @code{--enable-logging} is set to @code{no}, the binary will contain
no log messages at all.
@item If @code{--enable-logging} is set to @code{yes}, the binary will contain
no DEBUG messages, and therefore running with -L DEBUG will have no effect.
Other messages (ERROR, WARNING, INFO, etc) will be included.
@item If @code{--enable-logging} is set to @code{verbose}, or
@code{veryverbose} the binary will contain DEBUG messages (still, it will be
neccessary to run with -L DEBUG or set the DEBUG config option to show them).
@end itemize


If you are a developer:
@itemize @bullet
@item please make sure that you @code{./configure
--enable-logging=@{verbose,veryverbose@}}, so you can see DEBUG messages.
@item please remove the @code{#if} statements around @code{GNUNET_log
(GNUNET_ERROR_TYPE_DEBUG, ...)} lines, to improve the readibility of your code.
@end itemize

Since now activating DEBUG automatically makes it VERBOSE and activates
@strong{all} debug messages by default, you probably want to use the
https://gnunet.org/logging functionality to filter only relevant messages. A
suitable configuration could be:@ @code{$ export
GNUNET_FORCE_LOG="^YOUR_SUBSYSTEM$;;;;DEBUG/;;;;WARNING"}@ Which will behave
almost like enabling DEBUG in that subsytem before the change. Of course you
can adapt it to your particular needs, this is only a quick example.

@c ***************************************************************************
@node Interprocess communication API (IPC)
@subsection Interprocess communication API (IPC)

In GNUnet a variety of new message types might be defined and used in
interprocess communication, in this tutorial we use the @code{struct
AddressLookupMessage} as a example to introduce how to construct our own
message type in GNUnet and how to implement the message communication between
service and client.@ (Here, a client uses the @code{struct
AddressLookupMessage} as a request to ask the server to return the address of
any other peer connecting to the service.)


@c ***************************************************************************
@menu
* Define new message types::
* Define message struct::
* Client - Establish connection::
* Client - Initialize request message::
* Client - Send request and receive response::
* Server - Startup service::
* Server - Add new handles for specified messages::
* Server - Process request message::
* Server - Response to client::
* Server - Notification of clients::
* Conversion between Network Byte Order (Big Endian) and Host Byte Order::
@end menu

@node Define new message types
@subsubsection Define new message types

First of all, you should define the new message type in
@code{gnunet_protocols.h}:
@example
 // Request to look addresses of peers in server.
#define GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP 29
  // Response to the address lookup request.
#define GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY 30
@end example

@c ***************************************************************************
@node Define message struct
@subsubsection Define message struct

After the type definition, the specified message structure should also be
described in the header file, e.g. transport.h in our case.
@example
GNUNET_NETWORK_STRUCT_BEGIN

struct AddressLookupMessage @{ struct GNUNET_MessageHeader header; int32_t
numeric_only GNUNET_PACKED; struct GNUNET_TIME_AbsoluteNBO timeout; uint32_t
addrlen GNUNET_PACKED;
 /* followed by 'addrlen' bytes of the actual address, then
    followed by the 0-terminated name of the transport */ @};
    GNUNET_NETWORK_STRUCT_END
@end example


Please note @code{GNUNET_NETWORK_STRUCT_BEGIN} and @code{GNUNET_PACKED} which
both ensure correct alignment when sending structs over the network

@menu
@end menu

@c ***************************************************************************
@node Client - Establish connection
@subsubsection Client - Establish connection
@c %**end of header


At first, on the client side, the underlying API is employed to create a new
connection to a service, in our example the transport service would be
connected.
@example
struct GNUNET_CLIENT_Connection *client; client =
GNUNET_CLIENT_connect ("transport", cfg);
@end example

@c ***************************************************************************
@node Client - Initialize request message
@subsubsection Client - Initialize request message
@c %**end of header

When the connection is ready, we initialize the message. In this step, all the
fields of the message should be properly initialized, namely the size, type,
and some extra user-defined data, such as timeout, name of transport, address
and name of transport.
@example
struct AddressLookupMessage *msg; size_t len =
sizeof (struct AddressLookupMessage) + addressLen + strlen (nameTrans) + 1;
msg->header->size = htons (len); msg->header->type = htons
(GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP); msg->timeout =
GNUNET_TIME_absolute_hton (abs_timeout); msg->addrlen = htonl (addressLen);
char *addrbuf = (char *) &msg[1]; memcpy (addrbuf, address, addressLen); char
*tbuf = &addrbuf[addressLen]; memcpy (tbuf, nameTrans, strlen (nameTrans) + 1);
@end example

Note that, here the functions @code{htonl}, @code{htons} and
@code{GNUNET_TIME_absolute_hton} are applied to convert little endian into big
endian, about the usage of the big/small edian order and the corresponding
conversion function please refer to Introduction of Big Endian and Little
Endian.

@c ***************************************************************************
@node Client - Send request and receive response
@subsubsection Client - Send request and receive response
@c %**end of header

FIXME: This is very outdated, see the tutorial for the
current API!

Next, the client would send the constructed message as a request to the service
and wait for the response from the service. To accomplish this goal, there are
a number of API calls that can be used. In this example,
@code{GNUNET_CLIENT_transmit_and_get_response} is chosen as the most
appropriate function to use.
@example
GNUNET_CLIENT_transmit_and_get_response
(client, msg->header, timeout, GNUNET_YES, &address_response_processor,
arp_ctx);
@end example

the argument @code{address_response_processor} is a function with
@code{GNUNET_CLIENT_MessageHandler} type, which is used to process the reply
message from the service.

@node Server - Startup service
@subsubsection Server - Startup service

After receiving the request message, we run a standard GNUnet service startup
sequence using @code{GNUNET_SERVICE_run}, as follows,
@example
int main(int
argc, char**argv) @{ GNUNET_SERVICE_run(argc, argv, "transport"
GNUNET_SERVICE_OPTION_NONE, &run, NULL)); @}
@end example

@c ***************************************************************************
@node Server - Add new handles for specified messages
@subsubsection Server - Add new handles for specified messages
@c %**end of header

in the function above the argument @code{run} is used to initiate transport
service,and defined like this:
@example
static void run (void *cls, struct
GNUNET_SERVER_Handle *serv, const struct GNUNET_CONFIGURATION_Handle *cfg) @{
GNUNET_SERVER_add_handlers (serv, handlers); @}
@end example


Here, @code{GNUNET_SERVER_add_handlers} must be called in the run function to
add new handlers in the service. The parameter @code{handlers} is a list of
@code{struct GNUNET_SERVER_MessageHandler} to tell the service which function
should be called when a particular type of message is received, and should be
defined in this way:
@example
static struct GNUNET_SERVER_MessageHandler
handlers[] = @{ @{&handle_start, NULL, GNUNET_MESSAGE_TYPE_TRANSPORT_START,
0@}, @{&handle_send, NULL, GNUNET_MESSAGE_TYPE_TRANSPORT_SEND, 0@},
@{&handle_try_connect, NULL, GNUNET_MESSAGE_TYPE_TRANSPORT_TRY_CONNECT, sizeof
(struct TryConnectMessage)@}, @{&handle_address_lookup, NULL,
GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP, 0@}, @{NULL, NULL, 0, 0@} @};
@end example


As shown, the first member of the struct in the first area is a callback
function, which is called to process the specified message types, given as the
third member. The second parameter is the closure for the callback function,
which is set to @code{NULL} in most cases, and the last parameter is the
expected size of the message of this type, usually we set it to 0 to accept
variable size, for special cases the exact size of the specified message also
can be set. In addition, the terminator sign depicted as @code{@{NULL, NULL, 0,
0@}} is set in the last aera.

@c ***************************************************************************
@node Server - Process request message
@subsubsection Server - Process request message
@c %**end of header

After the initialization of transport service, the request message would be
processed. Before handling the main message data, the validity of this message
should be checked out, e.g., to check whether the size of message is correct.
@example
size = ntohs (message->size); if (size < sizeof (struct
AddressLookupMessage)) @{ GNUNET_break_op (0); GNUNET_SERVER_receive_done
(client, GNUNET_SYSERR); return; @}
@end example


Note that, opposite to the construction method of the request message in the
client, in the server the function @code{nothl} and @code{ntohs} should be
employed during the extraction of the data from the message, so that the data
in big endian order can be converted back into little endian order. See more in
detail please refer to Introduction of Big Endian and Little Endian.

Moreover in this example, the name of the transport stored in the message is a
0-terminated string, so we should also check whether the name of the transport
in the received message is 0-terminated:
@example
nameTransport = (const char *)
&address[addressLen]; if (nameTransport[size - sizeof (struct
AddressLookupMessage)
                                - addressLen - 1] != '\0') @{ GNUNET_break_op
                                  (0); GNUNET_SERVER_receive_done (client,
                                  GNUNET_SYSERR); return; @}
@end example

Here, @code{GNUNET_SERVER_receive_done} should be called to tell the service
that the request is done and can receive the next message. The argument
@code{GNUNET_SYSERR} here indicates that the service didn't understand the
request message, and the processing of this request would be terminated.

In comparison to the aforementioned situation, when the argument is equal to
@code{GNUNET_OK}, the service would continue to process the requst message.

@c ***************************************************************************
@node Server - Response to client
@subsubsection Server - Response to client
@c %**end of header

Once the processing of current request is done, the server should give the
response to the client. A new @code{struct AddressLookupMessage} would be
produced by the server in a similar way as the client did and sent to the
client, but here the type should be
@code{GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY} rather than
@code{GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP} in client.
@example
struct
AddressLookupMessage *msg; size_t len = sizeof (struct AddressLookupMessage) +
addressLen + strlen (nameTrans) + 1; msg->header->size = htons (len);
msg->header->type = htons (GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY);

// ...

struct GNUNET_SERVER_TransmitContext *tc; tc =
GNUNET_SERVER_transmit_context_create (client);
GNUNET_SERVER_transmit_context_append_data (tc, NULL, 0,
GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY);
GNUNET_SERVER_transmit_context_run (tc, rtimeout);
@end example


Note that, there are also a number of other APIs provided to the service to
send the message.

@c ***************************************************************************
@node Server - Notification of clients
@subsubsection Server - Notification of clients
@c %**end of header

Often a service needs to (repeatedly) transmit notifications to a client or a
group of clients. In these cases, the client typically has once registered for
a set of events and then needs to receive a message whenever such an event
happens (until the client disconnects). The use of a notification context can
help manage message queues to clients and handle disconnects. Notification
contexts can be used to send individualized messages to a particular client or
to broadcast messages to a group of clients. An individualized notification
might look like this:
@example
 GNUNET_SERVER_notification_context_unicast(nc,
 client, msg, GNUNET_YES);
@end example


Note that after processing the original registration message for notifications,
the server code still typically needs to call@
@code{GNUNET_SERVER_receive_done} so that the client can transmit further
messages to the server.

@c ***************************************************************************
@node Conversion between Network Byte Order (Big Endian) and Host Byte Order
@subsubsection Conversion between Network Byte Order (Big Endian) and Host Byte Order
@c %** subsub? it's a referenced page on the ipc document.
@c %**end of header

Here we can simply comprehend big endian and little endian as Network Byte
Order and Host Byte Order respectively. What is the difference between both
two?

Usually in our host computer we store the data byte as Host Byte Order, for
example, we store a integer in the RAM which might occupies 4 Byte, as Host
Byte Order the higher Byte would be stored at the lower address of RAM, and
the lower Byte would be stored at the higher address of RAM. However, contrast
to this, Network Byte Order just take the totally opposite way to store the
data, says, it will store the lower Byte at the lower address, and the higher
Byte will stay at higher address.

For the current communication of network, we normally exchange the information
by surveying the data package, every two host wants to communicate with each
other must send and receive data package through network. In order to maintain
the identity of data through the transmission in the network, the order of the
Byte storage must changed before sending and after receiving the data.

There ten convenient functions to realize the conversion of Byte Order in
GNUnet, as following:
@table @asis

@item uint16_t htons(uint16_t hostshort) Convert host byte order to net byte
order with short int
@item uint32_t htonl(uint32_t hostlong) Convert host byte
order to net byte order with long int
@item uint16_t ntohs(uint16_t netshort)
Convert net byte order to host byte order with short int
@item uint32_t
ntohl(uint32_t netlong) Convert net byte order to host byte order with long int
@item unsigned long long GNUNET_ntohll (unsigned long long netlonglong) Convert
net byte order to host byte order with long long int
@item unsigned long long
GNUNET_htonll (unsigned long long hostlonglong) Convert host byte order to net
byte order with long long int
@item struct GNUNET_TIME_RelativeNBO
GNUNET_TIME_relative_hton (struct GNUNET_TIME_Relative a) Convert relative time
to network byte order.
@item struct GNUNET_TIME_Relative
GNUNET_TIME_relative_ntoh (struct GNUNET_TIME_RelativeNBO a) Convert relative
time from network byte order.
@item struct GNUNET_TIME_AbsoluteNBO
GNUNET_TIME_absolute_hton (struct GNUNET_TIME_Absolute a) Convert relative time
to network byte order.
@item struct GNUNET_TIME_Absolute
GNUNET_TIME_absolute_ntoh (struct GNUNET_TIME_AbsoluteNBO a) Convert relative
time from network byte order.
@end table

@c ***************************************************************************

@node Cryptography API
@subsection Cryptography API
@c %**end of header

The gnunetutil APIs provides the cryptographic primitives used in GNUnet.
GNUnet uses 2048 bit RSA keys for the session key exchange and for signing
messages by peers and most other public-key operations. Most researchers in
cryptography consider 2048 bit RSA keys as secure and practically unbreakable
for a long time. The API provides functions to create a fresh key pair, read a
private key from a file (or create a new file if the file does not exist),
encrypt, decrypt, sign, verify and extraction of the public key into a format
suitable for network transmission.

For the encryption of files and the actual data exchanged between peers GNUnet
uses 256-bit AES encryption. Fresh, session keys are negotiated for every new
connection.@ Again, there is no published technique to break this cipher in any
realistic amount of time. The API provides functions for generation of keys,
validation of keys (important for checking that decryptions using RSA
succeeded), encryption and decryption.

GNUnet uses SHA-512 for computing one-way hash codes. The API provides
functions to compute a hash over a block in memory or over a file on disk.

The crypto API also provides functions for randomizing a block of memory,
obtaining a single random number and for generating a permuation of the numbers
0 to n-1. Random number generation distinguishes between WEAK and STRONG random
number quality; WEAK random numbers are pseudo-random whereas STRONG random
numbers use entropy gathered from the operating system.

Finally, the crypto API provides a means to deterministically generate a
1024-bit RSA key from a hash code. These functions should most likely not be
used by most applications; most importantly,@
GNUNET_CRYPTO_rsa_key_create_from_hash does not create an RSA-key that should
be considered secure for traditional applications of RSA.

@c ***************************************************************************
@node Message Queue API
@subsection Message Queue API
@c %**end of header

@strong{ Introduction }@ Often, applications need to queue messages that are to
be sent to other GNUnet peers, clients or services. As all of GNUnet's
message-based communication APIs, by design, do not allow messages to be
queued, it is common to implement custom message queues manually when they are
needed. However, writing very similar code in multiple places is tedious and
leads to code duplication.

MQ (for Message Queue) is an API that provides the functionality to implement
and use message queues. We intend to eventually replace all of the custom
message queue implementations in GNUnet with MQ.

@strong{ Basic Concepts }@ The two most important entities in MQ are queues and
envelopes.

Every queue is backed by a specific implementation (e.g. for mesh, stream,
connection, server client, etc.) that will actually deliver the queued
messages. For convenience,@ some queues also allow to specify a list of message
handlers. The message queue will then also wait for incoming messages and
dispatch them appropriately.

An envelope holds the the memory for a message, as well as metadata (Where is
the envelope queued? What should happen after it has been sent?). Any envelope
can only be queued in one message queue.

@strong{ Creating Queues }@ The following is a list of currently available
message queues. Note that to avoid layering issues, message queues for higher
level APIs are not part of @code{libgnunetutil}, but@ the respective API itself
provides the queue implementation.
@table @asis

@item @code{GNUNET_MQ_queue_for_connection_client} Transmits queued messages
over a @code{GNUNET_CLIENT_Connection}@ handle. Also supports receiving with
message handlers.@

@item @code{GNUNET_MQ_queue_for_server_client} Transmits queued messages over a
@code{GNUNET_SERVER_Client}@ handle. Does not support incoming message
handlers.@

@item @code{GNUNET_MESH_mq_create} Transmits queued messages over a
@code{GNUNET_MESH_Tunnel}@ handle. Does not support incoming message handlers.@

@item @code{GNUNET_MQ_queue_for_callbacks} This is the most general
implementation. Instead of delivering and receiving messages with one of
GNUnet's communication APIs, implementation callbacks are called. Refer to
"Implementing Queues" for a more detailed explanation.
@end table


@strong{ Allocating Envelopes }@ A GNUnet message (as defined by the
GNUNET_MessageHeader) has three parts: The size, the type, and the body.

MQ provides macros to allocate an envelope containing a message conveniently,@
automatically setting the size and type fields of the message.

Consider the following simple message, with the body consisting of a single
number value.@ @code{}
@example
struct NumberMessage @{
  /** Type: GNUNET_MESSAGE_TYPE_EXAMPLE_1 */
  struct GNUNET_MessageHeader header; uint32_t number GNUNET_PACKED; @};
@end example

An envelope containing an instance of the NumberMessage can be constructed like
this:
@example
struct GNUNET_MQ_Envelope *ev; struct NumberMessage *msg; ev =
GNUNET_MQ_msg (msg, GNUNET_MESSAGE_TYPE_EXAMPLE_1); msg->number = htonl (42);
@end example


In the above code, @code{GNUNET_MQ_msg} is a macro. The return value is the
newly allocated envelope. The first argument must be a pointer to some
@code{struct} containing a @code{struct GNUNET_MessageHeader header} field,
while the second argument is the desired message type, in host byte order.

The @code{msg} pointer now points to an allocated message, where the message
type and the message size are already set. The message's size is inferred from
the type of the @code{msg} pointer: It will be set to 'sizeof(*msg)', properly
converted to network byte order.

If the message body's size is dynamic, the the macro @code{GNUNET_MQ_msg_extra}
can be used to allocate an envelope whose message has additional space
allocated after the @code{msg} structure.

If no structure has been defined for the message,
@code{GNUNET_MQ_msg_header_extra} can be used to allocate additional space
after the message header. The first argument then must be a pointer to a
@code{GNUNET_MessageHeader}.

@strong{Envelope Properties}@ A few functions in MQ allow to set additional
properties on envelopes:
@table @asis

@item @code{GNUNET_MQ_notify_sent} Allows to specify a function that will be
called once the envelope's message@ has been sent irrevocably. An envelope can
be canceled precisely up to the@ point where the notify sent callback has been
called.
@item @code{GNUNET_MQ_disable_corking} No corking will be used when
sending the message. Not every@ queue supports this flag, per default,
envelopes are sent with corking.@

@end table


@strong{Sending Envelopes}@ Once an envelope has been constructed, it can be
queued for sending with @code{GNUNET_MQ_send}.

Note that in order to avoid memory leaks, an envelope must either be sent (the
queue will free it) or destroyed explicitly with @code{GNUNET_MQ_discard}.

@strong{Canceling Envelopes}@ An envelope queued with @code{GNUNET_MQ_send} can
be canceled with @code{GNUNET_MQ_cancel}. Note that after the notify sent
callback has been called, canceling a message results in undefined behavior.
Thus it is unsafe to cancel an envelope that does not have a notify sent
callback. When canceling an envelope, it is not necessary@ to call
@code{GNUNET_MQ_discard}, and the envelope can't be sent again.

@strong{ Implementing Queues }@ @code{TODO}

@c ***************************************************************************
@node Service API
@subsection Service API
@c %**end of header

Most GNUnet code lives in the form of services. Services are processes that
offer an API for other components of the system to build on. Those other
components can be command-line tools for users, graphical user interfaces or
other services. Services provide their API using an IPC protocol. For this,
each service must listen on either a TCP port or a UNIX domain socket; for
this, the service implementation uses the server API. This use of server is
exposed directly to the users of the service API. Thus, when using the service
API, one is usually also often using large parts of the server API. The service
API provides various convenience functions, such as parsing command-line
arguments and the configuration file, which are not found in the server API.
The dual to the service/server API is the client API, which can be used to
access services.

The most common way to start a service is to use the GNUNET_SERVICE_run
function from the program's main function. GNUNET_SERVICE_run will then parse
the command line and configuration files and, based on the options found there,
start the server. It will then give back control to the main program, passing
the server and the configuration to the GNUNET_SERVICE_Main callback.
GNUNET_SERVICE_run will also take care of starting the scheduler loop. If this
is inappropriate (for example, because the scheduler loop is already running),
GNUNET_SERVICE_start and related functions provide an alternative to
GNUNET_SERVICE_run.

When starting a service, the service_name option is used to determine which
sections in the configuration file should be used to configure the service. A
typical value here is the name of the src/ sub-directory, for example
"statistics". The same string would also be given to GNUNET_CLIENT_connect to
access the service.

Once a service has been initialized, the program should use the@
GNUNET_SERVICE_Main callback to register message handlers using
GNUNET_SERVER_add_handlers. The service will already have registered a handler
for the "TEST" message.

The option bitfield (enum GNUNET_SERVICE_Options) determines how a service
should behave during shutdown. There are three key strategies:
@table @asis

@item instant (GNUNET_SERVICE_OPTION_NONE) Upon receiving the shutdown signal
from the scheduler, the service immediately terminates the server, closing all
existing connections with clients.
@item manual
(GNUNET_SERVICE_OPTION_MANUAL_SHUTDOWN) The service does nothing by itself
during shutdown. The main program will need to take the appropriate action by
calling GNUNET_SERVER_destroy or GNUNET_SERVICE_stop (depending on how the
service was initialized) to terminate the service. This method is used by
gnunet-service-arm and rather uncommon.
@item soft
(GNUNET_SERVICE_OPTION_SOFT_SHUTDOWN) Upon receiving the shutdown signal from
the scheduler, the service immediately tells the server to stop listening for
incoming clients. Requests from normal existing clients are still processed and
the server/service terminates once all normal clients have disconnected.
Clients that are not expected to ever disconnect (such as clients that monitor
performance values) can be marked as 'monitor' clients using
GNUNET_SERVER_client_mark_monitor. Those clients will continue to be processed
until all 'normal' clients have disconnected. Then, the server will terminate,
closing the monitor connections. This mode is for example used by 'statistics',
allowing existing 'normal' clients to set (possibly persistent) statistic
values before terminating.
@end table

@c ***************************************************************************
@node Optimizing Memory Consumption of GNUnet's (Multi-) Hash Maps
@subsection Optimizing Memory Consumption of GNUnet's (Multi-) Hash Maps
@c %**end of header

A commonly used data structure in GNUnet is a (multi-)hash map. It is most
often used to map a peer identity to some data structure, but also to map
arbitrary keys to values (for example to track requests in the distributed hash
table or in file-sharing). As it is commonly used, the DHT is actually
sometimes responsible for a large share of GNUnet's overall memory consumption
(for some processes, 30% is not uncommon). The following text documents some
API quirks (and their implications for applications) that were recently
introduced to minimize the footprint of the hash map.


@c ***************************************************************************
@menu
* Analysis::
* Solution::
* Migration::
* Conclusion::
* Availability::
@end menu

@node Analysis
@subsubsection Analysis
@c %**end of header

The main reason for the "excessive" memory consumption by the hash map is that
GNUnet uses 512-bit cryptographic hash codes --- and the (multi-)hash map also
uses the same 512-bit 'struct GNUNET_HashCode'. As a result, storing just the
keys requires 64 bytes of memory for each key. As some applications like to
keep a large number of entries in the hash map (after all, that's what maps
are good for), 64 bytes per hash is significant: keeping a pointer to the
value and having a linked list for collisions consume between 8 and 16 bytes,
and 'malloc' may add about the same overhead per allocation, putting us in the
16 to 32 byte per entry ballpark. Adding a 64-byte key then triples the
overall memory requirement for the hash map.

To make things "worse", most of the time storing the key in the hash map is
not required: it is typically already in memory elsewhere! In most cases, the
values stored in the hash map are some application-specific struct that _also_
contains the hash. Here is a simplified example:
@example
struct MyValue @{
struct GNUNET_HashCode key; unsigned int my_data; @};

// ...
val = GNUNET_malloc (sizeof (struct MyValue)); val->key = key; val->my_data =
42; GNUNET_CONTAINER_multihashmap_put (map, &key, val, ...);
@end example


This is a common pattern as later the entries might need to be removed, and at
that time it is convenient to have the key immediately at hand:
@example
GNUNET_CONTAINER_multihashmap_remove (map, &val->key, val);
@end example


Note that here we end up with two times 64 bytes for the key, plus maybe 64
bytes total for the rest of the 'struct MyValue' and the map entry in the hash
map. The resulting redundant storage of the key increases overall memory
consumption per entry from the "optimal" 128 bytes to 192 bytes. This is not
just an extreme example: overheads in practice are actually sometimes close to
those highlighted in this example. This is especially true for maps with a
significant number of entries, as there we tend to really try to keep the
entries small.
@c ***************************************************************************
@node Solution
@subsubsection Solution
@c %**end of header

The solution that has now been implemented is to @strong{optionally} allow the
hash map to not make a (deep) copy of the hash but instead have a pointer to
the hash/key in the entry. This reduces the memory consumption for the key
from 64 bytes to 4 to 8 bytes. However, it can also only work if the key is
actually stored in the entry (which is the case most of the time) and if the
entry does not modify the key (which in all of the code I'm aware of has been
always the case if there key is stored in the entry). Finally, when the client
stores an entry in the hash map, it @strong{must} provide a pointer to the key
within the entry, not just a pointer to a transient location of the key. If
the client code does not meet these requirements, the result is a dangling
pointer and undefined behavior of the (multi-)hash map API.
@c ***************************************************************************
@node Migration
@subsubsection Migration
@c %**end of header

To use the new feature, first check that the values contain the respective key
(and never modify it). Then, all calls to
@code{GNUNET_CONTAINER_multihashmap_put} on the respective map must be audited
and most likely changed to pass a pointer into the value's struct. For the
initial example, the new code would look like this:
@example
struct MyValue @{
struct GNUNET_HashCode key; unsigned int my_data; @};

// ...
val = GNUNET_malloc (sizeof (struct MyValue)); val->key = key; val->my_data =
42; GNUNET_CONTAINER_multihashmap_put (map, &val->key, val, ...);
@end example


Note that @code{&val} was changed to @code{&val->key} in the argument to the
@code{put} call. This is critical as often @code{key} is on the stack or in
some other transient data structure and thus having the hash map keep a pointer
to @code{key} would not work. Only the key inside of @code{val} has the same
lifetime as the entry in the map (this must of course be checked as well).
Naturally, @code{val->key} must be intiialized before the @code{put} call. Once
all @code{put} calls have been converted and double-checked, you can change the
call to create the hash map from
@example
map =
GNUNET_CONTAINER_multihashmap_create (SIZE, GNUNET_NO);
@end example

to

@example
map = GNUNET_CONTAINER_multihashmap_create (SIZE, GNUNET_YES);
@end example

If everything was done correctly, you now use about 60 bytes less memory per
entry in @code{map}. However, if now (or in the future) any call to @code{put}
does not ensure that the given key is valid until the entry is removed from the
map, undefined behavior is likely to be observed.
@c ***************************************************************************
@node Conclusion
@subsubsection Conclusion
@c %**end of header

The new optimization can is often applicable and can result in a reduction in
memory consumption of up to 30% in practice. However, it makes the code less
robust as additional invariants are imposed on the multi hash map client. Thus
applications should refrain from enabling the new mode unless the resulting
performance increase is deemed significant enough. In particular, it should
generally not be used in new code (wait at least until benchmarks exist).
@c ***************************************************************************
@node Availability
@subsubsection Availability
@c %**end of header

The new multi hash map code was committed in SVN 24319 (will be in GNUnet
0.9.4). Various subsystems (transport, core, dht, file-sharing) were
previously audited and modified to take advantage of the new capability. In
particular, memory consumption of the file-sharing service is expected to drop
by 20-30% due to this change.

@c ***************************************************************************
@node The CONTAINER_MDLL API
@subsection The CONTAINER_MDLL API
@c %**end of header

This text documents the GNUNET_CONTAINER_MDLL API. The GNUNET_CONTAINER_MDLL
API is similar to the GNUNET_CONTAINER_DLL API in that it provides operations
for the construction and manipulation of doubly-linked lists. The key
difference to the (simpler) DLL-API is that the MDLL-version allows a single
element (instance of a "struct") to be in multiple linked lists at the same
time.

Like the DLL API, the MDLL API stores (most of) the data structures for the
doubly-linked list with the respective elements; only the 'head' and 'tail'
pointers are stored "elsewhere" --- and the application needs to provide the
locations of head and tail to each of the calls in the MDLL API. The key
difference for the MDLL API is that the "next" and "previous" pointers in the
struct can no longer be simply called "next" and "prev" --- after all, the
element may be in multiple doubly-linked lists, so we cannot just have one
"next" and one "prev" pointer!

The solution is to have multiple fields that must have a name of the format
"next_XX" and "prev_XX" where "XX" is the name of one of the doubly-linked
lists. Here is a simple example:
@example
struct MyMultiListElement @{ struct
MyMultiListElement *next_ALIST; struct MyMultiListElement *prev_ALIST; struct
MyMultiListElement *next_BLIST; struct MyMultiListElement *prev_BLIST; void
*data; @};
@end example


Note that by convention, we use all-uppercase letters for the list names. In
addition, the program needs to have a location for the head and tail pointers
for both lists, for example:
@example
static struct MyMultiListElement
*head_ALIST; static struct MyMultiListElement *tail_ALIST; static struct
MyMultiListElement *head_BLIST; static struct MyMultiListElement *tail_BLIST;
@end example


Using the MDLL-macros, we can now insert an element into the ALIST:
@example
GNUNET_CONTAINER_MDLL_insert (ALIST, head_ALIST, tail_ALIST, element);
@end example


Passing "ALIST" as the first argument to MDLL specifies which of the next/prev
fields in the 'struct MyMultiListElement' should be used. The extra "ALIST"
argument and the "_ALIST" in the names of the next/prev-members are the only
differences between the MDDL and DLL-API. Like the DLL-API, the MDLL-API offers
functions for inserting (at head, at tail, after a given element) and removing
elements from the list. Iterating over the list should be done by directly
accessing the "next_XX" and/or "prev_XX" members.

@c ***************************************************************************
@node The Automatic Restart Manager (ARM)
@section The Automatic Restart Manager (ARM)
@c %**end of header

GNUnet's Automated Restart Manager (ARM) is the GNUnet service responsible for
system initialization and service babysitting. ARM starts and halts services,
detects configuration changes and restarts services impacted by the changes as
needed. It's also responsible for restarting services in case of crashes and is
planned to incorporate automatic debugging for diagnosing service crashes
providing developers insights about crash reasons. The purpose of this document
is to give GNUnet developer an idea about how ARM works and how to interact
with it.

@menu
* Basic functionality::
* Key configuration options::
* Availability2::
* Reliability::
@end menu

@c ***************************************************************************
@node Basic functionality
@subsection Basic functionality
@c %**end of header

@itemize @bullet
@item ARM source code can be found under "src/arm".@ Service processes are
managed by the functions in "gnunet-service-arm.c" which is controlled with
"gnunet-arm.c" (main function in that file is ARM's entry point).

@item The functions responsible for communicating with ARM , starting and
stopping services -including ARM service itself- are provided by the ARM API
"arm_api.c".@ Function: GNUNET_ARM_connect() returns to the caller an ARM
handle after setting it to the caller's context (configuration and scheduler in
use). This handle can be used afterwards by the caller to communicate with ARM.
Functions GNUNET_ARM_start_service() and GNUNET_ARM_stop_service() are used for
starting and stopping services respectively.

@item A typical example of using these basic ARM services can be found in file
test_arm_api.c. The test case connects to ARM, starts it, then uses it to start
a service "resolver", stops the "resolver" then stops "ARM".
@end itemize

@c ***************************************************************************
@node Key configuration options
@subsection Key configuration options
@c %**end of header

Configurations for ARM and services should be available in a .conf file (As an
example, see test_arm_api_data.conf). When running ARM, the configuration file
to use should be passed to the command:@ @code{@ $ gnunet-arm -s -c
configuration_to_use.conf@ }@ If no configuration is passed, the default
configuration file will be used (see GNUNET_PREFIX/share/gnunet/defaults.conf
which is created from contrib/defaults.conf).@ Each of the services is having a
section starting by the service name between square brackets, for example:
"[arm]". The following options configure how ARM configures or interacts with
the various services:

@table @asis

@item PORT Port number on which the service is listening for incoming TCP
connections. ARM will start the services should it notice a request at this
port.

@item HOSTNAME Specifies on which host the service is deployed. Note
that ARM can only start services that are running on the local system (but will
not check that the hostname matches the local machine name). This option is
used by the @code{gnunet_client_lib.h} implementation to determine which system
to connect to. The default is "localhost".

@item BINARY The name of the service binary file.

@item OPTIONS To be passed to the service.

@item PREFIX A command to pre-pend to the actual command, for example, running
a service with "valgrind" or "gdb"

@item DEBUG Run in debug mode (much verbosity).

@item AUTOSTART ARM will listen to UNIX domain socket and/or TCP port of the
service and start the service on-demand.

@item FORCESTART ARM will always
start this service when the peer is started.

@item ACCEPT_FROM IPv4 addresses the service accepts connections from.

@item ACCEPT_FROM6 IPv6 addresses the service accepts connections from.

@end table


Options that impact the operation of ARM overall are in the "[arm]" section.
ARM is a normal service and has (except for AUTOSTART) all of the options that
other services do. In addition, ARM has the following options:
@table @asis

@item GLOBAL_PREFIX Command to be pre-pended to all services that are going to
run.@

@item GLOBAL_POSTFIX Global option that will be supplied to all the services
that are going to run.@

@end table

@c ***************************************************************************
@node Availability2
@subsection Availability2
@c %**end of header

As mentioned before, one of the features provided by ARM is starting services
on demand. Consider the example of one service "client" that wants to connect
to another service a "server". The "client" will ask ARM to run the "server".
ARM starts the "server". The "server" starts listening to incoming connections.
The "client" will establish a connection with the "server". And then, they will
start to communicate together.@ One problem with that scheme is that it's
slow!@ The "client" service wants to communicate with the "server" service at
once and is not willing wait for it to be started and listening to incoming
connections before serving its request.@ One solution for that problem will be
that ARM starts all services as default services. That solution will solve the
problem, yet, it's not quite practical, for some services that are going to be
started can never be used or are going to be used after a relatively long
time.@ The approach followed by ARM to solve this problem is as follows:
@itemize @bullet


@item For each service having a PORT field in the configuration file and that
is not one of the default services ( a service that accepts incoming
connections from clients), ARM creates listening sockets for all addresses
associated with that service.

@item The "client" will immediately establish a connection with the "server".

@item ARM --- pretending to be the "server" --- will listen on the respective
port and notice the incoming connection from the "client" (but not accept it),
instead

@item Once there is an incoming connection, ARM will start the "server",
passing on the listen sockets (now, the service is started and can do its
work).

@item Other client services now can directly connect directly to the "server".
@end itemize

@c ***************************************************************************
@node Reliability
@subsection Reliability

One of the features provided by ARM, is the automatic restart of crashed
services.@ ARM needs to know which of the running services died. Function
"gnunet-service-arm.c/maint_child_death()" is responsible for that. The
function is scheduled to run upon receiving a SIGCHLD signal. The function,
then, iterates ARM's list of services running and monitors which service has
died (crashed). For all crashing services, ARM restarts them.@ Now, considering
the case of a service having a serious problem causing it to crash each time
it's started by ARM. If ARM keeps blindly restarting such a service, we are
going to have the pattern: start-crash-restart-crash-restart-crash and so
forth!! Which is of course not practical.@ For that reason, ARM schedules the
service to be restarted after waiting for some delay that grows exponentially
with each crash/restart of that service.@ To clarify the idea, considering the
following example:
@itemize @bullet


@item Service S crashed.

@item ARM receives the SIGCHLD and inspects its list of services to find the
dead one(s).

@item ARM finds S dead and schedules it for restarting after "backoff" time
which is initially set to 1ms. ARM will double the backoff time correspondent
to S (now backoff(S) = 2ms)

@item Because there is a severe problem with S, it crashed again.

@item Again ARM receives the SIGCHLD and detects that it's S again that's
crashed. ARM schedules it for restarting but after its new backoff time (which
became 2ms), and doubles its backoff time (now backoff(S) = 4).

@item and so on, until backoff(S) reaches a certain threshold
(EXPONENTIAL_BACKOFF_THRESHOLD is set to half an hour), after reaching it,
backoff(S) will remain half an hour, hence ARM won't be busy for a lot of time
trying to restart a problematic service.
@end itemize

@c ***************************************************************************
@node GNUnet's TRANSPORT Subsystem
@section GNUnet's TRANSPORT Subsystem
@c %**end of header

This chapter documents how the GNUnet transport subsystem works. The GNUnet
transport subsystem consists of three main components: the transport API (the
interface used by the rest of the system to access the transport service), the
transport service itself (most of the interesting functions, such as choosing
transports, happens here) and the transport plugins. A transport plugin is a
concrete implementation for how two GNUnet peers communicate; many plugins
exist, for example for communication via TCP, UDP, HTTP, HTTPS and others.
Finally, the transport subsystem uses supporting code, especially the NAT/UPnP
library to help with tasks such as NAT traversal.

Key tasks of the transport service include:
@itemize @bullet


@item Create our HELLO message, notify clients and neighbours if our HELLO
changes (using NAT library as necessary)

@item Validate HELLOs from other peers (send PING), allow other peers to
validate our HELLO's addresses (send PONG)

@item Upon request, establish connections to other peers (using address
selection from ATS subsystem) and maintain them (again using PINGs and PONGs)
as long as desired

@item Accept incoming connections, give ATS service the opportunity to switch
communication channels

@item Notify clients about peers that have connected to us or that have been
disconnected from us

@item If a (stateful) connection goes down unexpectedly (without explicit
DISCONNECT), quickly attempt to recover (without notifying clients) but do
notify clients quickly if reconnecting fails

@item Send (payload) messages arriving from clients to other peers via
transport plugins and receive messages from other peers, forwarding those to
clients

@item Enforce inbound traffic limits (using flow-control if it is applicable);
outbound traffic limits are enforced by CORE, not by us (!)

@item Enforce restrictions on P2P connection as specified by the blacklist
configuration and blacklisting clients
@end itemize


Note that the term "clients" in the list above really refers to the GNUnet-CORE
service, as CORE is typically the only client of the transport service.

@menu
* Address validation protocol::
@end menu

@node Address validation protocol
@subsection Address validation protocol
@c %**end of header

This section documents how the GNUnet transport service validates connections
with other peers. It is a high-level description of the protocol necessary to
understand the details of the implementation. It should be noted that when we
talk about PING and PONG messages in this section, we refer to transport-level
PING and PONG messages, which are different from core-level PING and PONG
messages (both in implementation and function).

The goal of transport-level address validation is to minimize the chances of a
successful man-in-the-middle attack against GNUnet peers on the transport
level. Such an attack would not allow the adversary to decrypt the P2P
transmissions, but a successful attacker could at least measure traffic volumes
and latencies (raising the adversaries capablities by those of a global passive
adversary in the worst case). The scenarios we are concerned about is an
attacker, Mallory, giving a HELLO to Alice that claims to be for Bob, but
contains Mallory's IP address instead of Bobs (for some transport). Mallory
would then forward the traffic to Bob (by initiating a connection to Bob and
claiming to be Alice). As a further complication, the scheme has to work even
if say Alice is behind a NAT without traversal support and hence has no address
of her own (and thus Alice must always initiate the connection to Bob).

An additional constraint is that HELLO messages do not contain a cryptographic
signature since other peers must be able to edit (i.e. remove) addresses from
the HELLO at any time (this was not true in GNUnet 0.8.x). A basic
@strong{assumption} is that each peer knows the set of possible network
addresses that it @strong{might} be reachable under (so for example, the
external IP address of the NAT plus the LAN address(es) with the respective
ports).

The solution is the following. If Alice wants to validate that a given address
for Bob is valid (i.e. is actually established @strong{directly} with the
intended target), it sends a PING message over that connection to Bob. Note
that in this case, Alice initiated the connection so only she knows which
address was used for sure (Alice maybe behind NAT, so whatever address Bob
sees may not be an address Alice knows she has). Bob checks that the address
given in the PING is actually one of his addresses (does not belong to
Mallory), and if it is, sends back a PONG (with a signature that says that Bob
owns/uses the address from the PING). Alice checks the signature and is happy
if it is valid and the address in the PONG is the address she used. This is
similar to the 0.8.x protocol where the HELLO contained a signature from Bob
for each address used by Bob. Here, the purpose code for the signature is
@code{GNUNET_SIGNATURE_PURPOSE_TRANSPORT_PONG_OWN}. After this, Alice will
remember Bob's address and consider the address valid for a while (12h in the
current implementation). Note that after this exchange, Alice only considers
Bob's address to be valid, the connection itself is not considered
'established'. In particular, Alice may have many addresses for Bob that she
considers valid.

The PONG message is protected with a nonce/challenge against replay attacks
and uses an expiration time for the signature (but those are almost
implementation details).

@node NAT library
@section NAT library
@c %**end of header

The goal of the GNUnet NAT library is to provide a general-purpose API for NAT
traversal @strong{without} third-party support. So protocols that involve
contacting a third peer to help establish a connection between two peers are
outside of the scope of this API. That does not mean that GNUnet doesn't
support involving a third peer (we can do this with the distance-vector
transport or using application-level protocols), it just means that the NAT API
is not concerned with this possibility. The API is written so that it will work
for IPv6-NAT in the future as well as current IPv4-NAT. Furthermore, the NAT
API is always used, even for peers that are not behind NAT --- in that case,
the mapping provided is simply the identity.

NAT traversal is initiated by calling @code{GNUNET_NAT_register}. Given a set
of addresses that the peer has locally bound to (TCP or UDP), the NAT library
will return (via callback) a (possibly longer) list of addresses the peer
@strong{might} be reachable under. Internally, depending on the configuration,
the NAT library will try to punch a hole (using UPnP) or just "know" that the
NAT was manually punched and generate the respective external IP address (the
one that should be globally visible) based on the given information.

The NAT library also supports ICMP-based NAT traversal. Here, the other peer
can request connection-reversal by this peer (in this special case, the peer is
even allowed to configure a port number of zero). If the NAT library detects a
connection-reversal request, it returns the respective target address to the
client as well. It should be noted that connection-reversal is currently only
intended for TCP, so other plugins @strong{must} pass @code{NULL} for the
reversal callback. Naturally, the NAT library also supports requesting
connection reversal from a remote peer (@code{GNUNET_NAT_run_client}).

Once initialized, the NAT handle can be used to test if a given address is
possibly a valid address for this peer (@code{GNUNET_NAT_test_address}). This
is used for validating our addresses when generating PONGs.

Finally, the NAT library contains an API to test if our NAT configuration is
correct. Using @code{GNUNET_NAT_test_start} @strong{before} binding to the
respective port, the NAT library can be used to test if the configuration
works. The test function act as a local client, initialize the NAT traversal
and then contact a @code{gnunet-nat-server} (running by default on
@code{gnunet.org}) and ask for a connection to be established. This way, it is
easy to test if the current NAT configuration is valid.

@node Distance-Vector plugin
@section Distance-Vector plugin
@c %**end of header

The Distance Vector (DV) transport is a transport mechanism that allows peers
to act as relays for each other, thereby connecting peers that would otherwise
be unable to connect. This gives a larger connection set to applications that
may work better with more peers to choose from (for example, File Sharing
and/or DHT).

The Distance Vector transport essentially has two functions. The first is
"gossiping" connection information about more distant peers to directly
connected peers. The second is taking messages intended for non-directly
connected peers and encapsulating them in a DV wrapper that contains the
required information for routing the message through forwarding peers. Via
gossiping, optimal routes through the known DV neighborhood are discovered and
utilized and the message encapsulation provides some benefits in addition to
simply getting the message from the correct source to the proper destination.

The gossiping function of DV provides an up to date routing table of peers that
are available up to some number of hops. We call this a fisheye view of the
network (like a fish, nearby objects are known while more distant ones
unknown). Gossip messages are sent only to directly connected peers, but they
are sent about other knowns peers within the "fisheye distance". Whenever two
peers connect, they immediately gossip to each other about their appropriate
other neighbors. They also gossip about the newly connected peer to previously
connected neighbors. In order to keep the routing tables up to date, disconnect
notifications are propogated as gossip as well (because disconnects may not be
sent/received, timeouts are also used remove stagnant routing table entries).

Routing of messages via DV is straightforward. When the DV transport is
notified of a message destined for a non-direct neighbor, the appropriate
forwarding peer is selected, and the base message is encapsulated in a DV
message which contains information about the initial peer and the intended
recipient. At each forwarding hop, the initial peer is validated (the
forwarding peer ensures that it has the initial peer in its neighborhood,
otherwise the message is dropped). Next the base message is re-encapsulated in
a new DV message for the next hop in the forwarding chain (or delivered to the
current peer, if it has arrived at the destination).

Assume a three peer network with peers Alice, Bob and Carol. Assume that Alice
<-> Bob and Bob <-> Carol are direct (e.g. over TCP or UDP transports)
connections, but that Alice cannot directly connect to Carol. This may be the
case due to NAT or firewall restrictions, or perhaps based on one of the peers
respective configurations. If the Distance Vector transport is enabled on all
three peers, it will automatically discover (from the gossip protocol) that
Alice and Carol can connect via Bob and provide a "virtual" Alice <-> Carol
connection. Routing between Alice and Carol happens as follows; Alice creates a
message destined for Carol and notifies the DV transport about it. The DV
transport at Alice looks up Carol in the routing table and finds that the
message must be sent through Bob for Carol. The message is encapsulated setting
Alice as the initiator and Carol as the destination and sent to Bob. Bob
receives the messages, verifies both Alice and Carol are known to Bob, and
re-wraps the message in a new DV message for Carol. The DV transport at Carol
receives this message, unwraps the original message, and delivers it to Carol
as though it came directly from Alice.

@node SMTP plugin
@section SMTP plugin
@c %**end of header

This page describes the new SMTP transport plugin for GNUnet as it exists in
the 0.7.x and 0.8.x branch. SMTP support is currently not available in GNUnet
0.9.x. This page also describes the transport layer abstraction (as it existed
in 0.7.x and 0.8.x) in more detail and gives some benchmarking results. The
performance results presented are quite old and maybe outdated at this point.
@itemize @bullet
@item Why use SMTP for a peer-to-peer transport?
@item SMTPHow does it work?
@item How do I configure my peer?
@item How do I test if it works?
@item How fast is it?
@item Is there any additional documentation?
@end itemize


@menu
* Why use SMTP for a peer-to-peer transport?::
* How does it work?::
* How do I configure my peer?::
* How do I test if it works?::
* How fast is it?::
@end menu

@node Why use SMTP for a peer-to-peer transport?
@subsection Why use SMTP for a peer-to-peer transport?
@c %**end of header

There are many reasons why one would not want to use SMTP:
@itemize @bullet
@item SMTP is using more bandwidth than TCP, UDP or HTTP
@item SMTP has a much higher latency.
@item SMTP requires significantly more computation (encoding and decoding time)
for the peers.
@item SMTP is significantly more complicated to configure.
@item SMTP may be abused by tricking GNUnet into sending mail to@
non-participating third parties.
@end itemize

So why would anybody want to use SMTP?
@itemize @bullet
@item SMTP can be used to contact peers behind NAT boxes (in virtual private
networks).
@item SMTP can be used to circumvent policies that limit or prohibit
peer-to-peer traffic by masking as "legitimate" traffic.
@item SMTP uses E-mail addresses which are independent of a specific IP, which
can be useful to address peers that use dynamic IP addresses.
@item SMTP can be used to initiate a connection (e.g. initial address exchange)
and peers can then negotiate the use of a more efficient protocol (e.g. TCP)
for the actual communication.
@end itemize

In summary, SMTP can for example be used to send a message to a peer behind a
NAT box that has a dynamic IP to tell the peer to establish a TCP connection
to a peer outside of the private network. Even an extraordinary overhead for
this first message would be irrelevant in this type of situation.

@node How does it work?
@subsection How does it work?
@c %**end of header

When a GNUnet peer needs to send a message to another GNUnet peer that has
advertised (only) an SMTP transport address, GNUnet base64-encodes the message
and sends it in an E-mail to the advertised address. The advertisement
contains a filter which is placed in the E-mail header, such that the
receiving host can filter the tagged E-mails and forward it to the GNUnet peer
process. The filter can be specified individually by each peer and be changed
over time. This makes it impossible to censor GNUnet E-mail messages by
searching for a generic filter.

@node How do I configure my peer?
@subsection How do I configure my peer?
@c %**end of header

First, you need to configure @code{procmail} to filter your inbound E-mail for
GNUnet traffic. The GNUnet messages must be delivered into a pipe, for example
@code{/tmp/gnunet.smtp}. You also need to define a filter that is used by
procmail to detect GNUnet messages. You are free to choose whichever filter
you like, but you should make sure that it does not occur in your other
E-mail. In our example, we will use @code{X-mailer: GNUnet}. The
@code{~/.procmailrc} configuration file then looks like this:
@example
:0:
* ^X-mailer: GNUnet
/tmp/gnunet.smtp
# where do you want your other e-mail delivered to (default: /var/spool/mail/)
:0: /var/spool/mail/
@end example

After adding this file, first make sure that your regular E-mail still works
(e.g. by sending an E-mail to yourself). Then edit the GNUnet configuration.
In the section @code{SMTP} you need to specify your E-mail address under
@code{EMAIL}, your mail server (for outgoing mail) under @code{SERVER}, the
filter (X-mailer: GNUnet in the example) under @code{FILTER} and the name of
the pipe under @code{PIPE}.@ The completed section could then look like this:
@example
EMAIL = me@@mail.gnu.org MTU = 65000 SERVER = mail.gnu.org:25 FILTER =
"X-mailer: GNUnet" PIPE = /tmp/gnunet.smtp
@end example

Finally, you need to add @code{smtp} to the list of @code{TRANSPORTS} in the
@code{GNUNETD} section. GNUnet peers will use the E-mail address that you
specified to contact your peer until the advertisement times out. Thus, if you
are not sure if everything works properly or if you are not planning to be
online for a long time, you may want to configure this timeout to be short,
e.g. just one hour. For this, set @code{HELLOEXPIRES} to @code{1} in the
@code{GNUNETD} section.

This should be it, but you may probably want to test it first.@
@node How do I test if it works?
@subsection How do I test if it works?
@c %**end of header

Any transport can be subjected to some rudimentary tests using the
@code{gnunet-transport-check} tool. The tool sends a message to the local node
via the transport and checks that a valid message is received. While this test
does not involve other peers and can not check if firewalls or other network
obstacles prohibit proper operation, this is a great testcase for the SMTP
transport since it tests pretty much nearly all of the functionality.

@code{gnunet-transport-check} should only be used without running
@code{gnunetd} at the same time. By default, @code{gnunet-transport-check}
tests all transports that are specified in the configuration file. But you can
specifically test SMTP by giving the option @code{--transport=smtp}.

Note that this test always checks if a transport can receive and send. While
you can configure most transports to only receive or only send messages, this
test will only work if you have configured the transport to send and receive
messages.

@node How fast is it?
@subsection How fast is it?
@c %**end of header

We have measured the performance of the UDP, TCP and SMTP transport layer
directly and when used from an application using the GNUnet core. Measureing
just the transport layer gives the better view of the actual overhead of the
protocol, whereas evaluating the transport from the application puts the
overhead into perspective from a practical point of view.

The loopback measurements of the SMTP transport were performed on three
different machines spanning a range of modern SMTP configurations. We used a
PIII-800 running RedHat 7.3 with the Purdue Computer Science configuration
which includes filters for spam. We also used a Xenon 2 GHZ with a vanilla
RedHat 8.0 sendmail configuration. Furthermore, we used qmail on a PIII-1000
running Sorcerer GNU Linux (SGL). The numbers for UDP and TCP are provided
using the SGL configuration. The qmail benchmark uses qmail's internal
filtering whereas the sendmail benchmarks relies on procmail to filter and
deliver the mail. We used the transport layer to send a message of b bytes
(excluding transport protocol headers) directly to the local machine. This
way, network latency and packet loss on the wire have no impact on the
timings. n messages were sent sequentially over the transport layer, sending
message i+1 after the i-th message was received. All messages were sent over
the same connection and the time to establish the connection was not taken
into account since this overhead is miniscule in practice --- as long as a
connection is used for a significant number of messages.

@multitable @columnfractions .20 .15 .15 .15 .15 .15
@headitem Transport @tab UDP @tab TCP @tab SMTP (Purdue sendmail) @tab SMTP (RH 8.0) @tab SMTP (SGL qmail)
@item  11 bytes @tab 31 ms @tab 55 ms @tab  781 s @tab 77 s @tab 24 s
@item  407 bytes @tab 37 ms @tab 62 ms @tab  789 s @tab 78 s @tab 25 s
@item 1,221 bytes @tab 46 ms @tab 73 ms @tab  804 s @tab 78 s @tab 25 s
@end multitable

The benchmarks show that UDP and TCP are, as expected, both significantly
faster compared with any of the SMTP services. Among the SMTP implementations,
there can be significant differences depending on the SMTP configuration.
Filtering with an external tool like procmail that needs to re-parse its
configuration for each mail can be very expensive. Applying spam filters can
also significantly impact the performance of the underlying SMTP
implementation. The microbenchmark shows that SMTP can be a viable solution
for initiating peer-to-peer sessions: a couple of seconds to connect to a peer
are probably not even going to be noticed by users. The next benchmark
measures the possible throughput for a transport. Throughput can be measured
by sending multiple messages in parallel and measuring packet loss. Note that
not only UDP but also the TCP transport can actually loose messages since the
TCP implementation drops messages if the @code{write} to the socket would
block. While the SMTP protocol never drops messages itself, it is often so
slow that only a fraction of the messages can be sent and received in the
given time-bounds. For this benchmark we report the message loss after
allowing t time for sending m messages. If messages were not sent (or
received) after an overall timeout of t, they were considered lost. The
benchmark was performed using two Xeon 2 GHZ machines running RedHat 8.0 with
sendmail. The machines were connected with a direct 100 MBit ethernet
connection.@ Figures udp1200, tcp1200 and smtp-MTUs show that the throughput
for messages of size 1,200 octects is 2,343 kbps, 3,310 kbps and 6 kbps for
UDP, TCP and SMTP respectively. The high per-message overhead of SMTP can be
improved by increasing the MTU, for example, an MTU of 12,000 octets improves
the throughput to 13 kbps as figure smtp-MTUs shows. Our research paper) has
some more details on the benchmarking results.

@node Bluetooth plugin
@section Bluetooth plugin
@c %**end of header

This page describes the new Bluetooth transport plugin for GNUnet. The plugin
is still in the testing stage so don't expect it to work perfectly. If you
have any questions or problems just post them here or ask on the IRC channel.
@itemize @bullet
@item What do I need to use the Bluetooth plugin transport?
@item BluetoothHow does it work?
@item What possible errors should I be aware of?
@item How do I configure my peer?
@item How can I test it?
@end itemize



@menu
* What do I need to use the Bluetooth plugin transport?::
* How does it work2?::
* What possible errors should I be aware of?::
* How do I configure my peer2?::
* How can I test it?::
* The implementation of the Bluetooth transport plugin::
@end menu

@node What do I need to use the Bluetooth plugin transport?
@subsection What do I need to use the Bluetooth plugin transport?
@c %**end of header

If you are a Linux user and you want to use the Bluetooth transport plugin you
should install the BlueZ development libraries (if they aren't already
installed). For instructions about how to install the libraries you should
check out the BlueZ site (@uref{http://www.bluez.org/, http://www.bluez.org}).
If you don't know if you have the necesarry libraries, don't worry, just run
the GNUnet configure script and you will be able to see a notification at the
end which will warn you if you don't have the necessary libraries.

If you are a Windows user you should have installed the
@emph{MinGW}/@emph{MSys2} with the latest updates (especially the
@emph{ws2bth} header). If this is your first build of GNUnet on Windows you
should check out the SBuild repository. It will semi-automatically assembles a
@emph{MinGW}/@emph{MSys2} installation with a lot of extra packages which are
needed for the GNUnet build. So this will ease your work!@ Finally you just
have to be sure that you have the correct drivers for your Bluetooth device
installed and that your device is on and in a discoverable mode. The Windows
Bluetooth Stack supports only the RFCOMM protocol so we cannot turn on your
device programatically!

@node How does it work2?
@subsection How does it work2?
@c %**end of header

The Bluetooth transport plugin uses virtually the same code as the WLAN plugin
and only the helper binary is different. The helper takes a single argument,
which represents the interface name and is specified in the configuration
file. Here are the basic steps that are followed by the helper binary used on
Linux:

@itemize @bullet
@item it verifies if the name corresponds to a Bluetooth interface name
@item it verifies if the iterface is up (if it is not, it tries to bring it up)
@item it tries to enable the page and inquiry scan in order to make the device
discoverable and to accept incoming connection requests
@emph{The above operations require root access so you should start the
transport plugin with root privileges.}
@item it finds an available port number and registers a SDP service which will
be used to find out on which port number is the server listening on and switch
the socket in listening mode
@item it sends a HELLO message with its address
@item finally it forwards traffic from the reading sockets to the STDOUT and
from the STDIN to the writing socket
@end itemize

Once in a while the device will make an inquiry scan to discover the nearby
devices and it will send them randomly HELLO messages for peer discovery.

@node What possible errors should I be aware of?
@subsection What possible errors should I be aware of?
@c %**end of header

@emph{This section is dedicated for Linux users}

Well there are many ways in which things could go wrong but I will try to
present some tools that you could use to debug and some scenarios.
@itemize @bullet

@item @code{bluetoothd -n -d} : use this command to enable logging in the
foreground and to print the logging messages

@item @code{hciconfig}: can be used to configure the Bluetooth devices. If you
run it without any arguments it will print information about the state of the
interfaces. So if you receive an error that the device couldn't be brought up
you should try to bring it manually and to see if it works (use @code{hciconfig
-a hciX up}). If you can't and the Bluetooth address has the form
00:00:00:00:00:00 it means that there is something wrong with the D-Bus daemon
or with the Bluetooth daemon. Use @code{bluetoothd} tool to see the logs

@item @code{sdptool} can be used to control and interogate SDP servers. If you
encounter problems regarding the SDP server (like the SDP server is down) you
should check out if the D-Bus daemon is running correctly and to see if the
Bluetooth daemon started correctly(use @code{bluetoothd} tool). Also, sometimes
the SDP service could work but somehow the device couldn't register his
service. Use @code{sdptool browse [dev-address]} to see if the service is
registered. There should be a service with the name of the interface and GNUnet
as provider.

@item @code{hcitool} : another useful tool which can be used to configure the
device and to send some particular commands to it.

@item @code{hcidump} : could be used for low level debugging
@end itemize

@node How do I configure my peer2?
@subsection How do I configure my peer2?
@c %**end of header

On Linux, you just have to be sure that the interface name corresponds to the
one that you want to use. Use the @code{hciconfig} tool to check that. By
default it is set to hci0 but you can change it.

A basic configuration looks like this:
@example
[transport-bluetooth]
# Name of the interface (typically hciX)
INTERFACE = hci0
# Real hardware, no testing
TESTMODE = 0 TESTING_IGNORE_KEYS = ACCEPT_FROM;
@end example


In order to use the Bluetooth transport plugin when the transport service is
started, you must add the plugin name to the default transport service plugins
list. For example:
@example
[transport] ...  PLUGINS = dns bluetooth ...
@end example

If you want to use only the Bluetooth plugin set @emph{PLUGINS = bluetooth}

On Windows, you cannot specify which device to use. The only thing that you
should do is to add @emph{bluetooth} on the plugins list of the transport
service.

@node How can I test it?
@subsection How can I test it?
@c %**end of header

If you have two Bluetooth devices on the same machine which use Linux you
must:
@itemize @bullet

@item create two different file configuration (one which will use the first
interface (@emph{hci0}) and the other which will use the second interface
(@emph{hci1})). Let's name them @emph{peer1.conf} and @emph{peer2.conf}.

@item run @emph{gnunet-peerinfo -c peerX.conf -s} in order to generate the
peers private keys. The @strong{X} must be replace with 1 or 2.

@item run @emph{gnunet-arm -c peerX.conf -s -i=transport} in order to start the
transport service. (Make sure that you have "bluetooth" on the transport
plugins list if the Bluetooth transport service doesn't start.)

@item run @emph{gnunet-peerinfo -c peer1.conf -s} to get the first peer's ID.
If you already know your peer ID (you saved it from the first command), this
can be skipped.

@item run @emph{gnunet-transport -c peer2.conf -p=PEER1_ID -s} to start sending
data for benchmarking to the other peer.
@end itemize


This scenario will try to connect the second peer to the first one and then
start sending data for benchmarking.

On Windows you cannot test the plugin functionality using two Bluetooth devices
from the same machine because after you install the drivers there will occur
some conflicts between the Bluetooth stacks. (At least that is what happend on
my machine : I wasn't able to use the Bluesoleil stack and the WINDCOMM one in
the same time).

If you have two different machines and your configuration files are good you
can use the same scenario presented on the begining of this section.

Another way to test the plugin functionality is to create your own application
which will use the GNUnet framework with the Bluetooth transport service.

@node The implementation of the Bluetooth transport plugin
@subsection The implementation of the Bluetooth transport plugin
@c %**end of header

This page describes the implementation of the Bluetooth transport plugin.

First I want to remind you that the Bluetooth transport plugin uses virtually
the same code as the WLAN plugin and only the helper binary is different. Also
the scope of the helper binary from the Bluetooth transport plugin is the same
as the one used for the wlan transport plugin: it acceses the interface and
then it forwards traffic in both directions between the Bluetooth interface
and stdin/stdout of the process involved.

The Bluetooth plugin transport could be used both on Linux and Windows
platforms.

@itemize @bullet
@item Linux functionality
@item Windows functionality
@item Pending Features
@end itemize



@menu
* Linux functionality::
* THE INITIALIZATION::
* THE LOOP::
* Details about the broadcast implementation::
* Windows functionality::
* Pending features::
@end menu

@node Linux functionality
@subsubsection Linux functionality
@c %**end of header

In order to implement the plugin functionality on Linux I used the BlueZ
stack. For the communication with the other devices I used the RFCOMM
protocol. Also I used the HCI protocol to gain some control over the device.
The helper binary takes a single argument (the name of the Bluetooth
interface) and is separated in two stages:

@c %** 'THE INITIALIZATION' should be in bigger letters or stand out, not
@c %** starting a new section?
@node THE INITIALIZATION
@subsubsection THE INITIALIZATION

@itemize @bullet
@item first, it checks if we have root privilegies (@emph{Remember that we need
to have root privilegies in order to be able to bring the interface up if it is
down or to change its state.}).

@item second, it verifies if the interface with the given name exists.

@strong{If the interface with that name exists and it is a Bluetooth
interface:}

@item it creates a RFCOMM socket which will be used for listening and call the
@emph{open_device} method

On the @emph{open_device} method:
@itemize @bullet
@item creates a HCI socket used to send control events to the the device
@item searches for the device ID using the interface name
@item saves the device MAC address
@item checks if the interface is down and tries to bring it UP
@item checks if the interface is in discoverable mode and tries to make it
discoverable
@item closes the HCI socket and binds the RFCOMM one
@item switches the RFCOMM socket in listening mode
@item registers the SDP service (the service will be used by the other devices
to get the port on which this device is listening on)
@end itemize

@item drops the root privilegies

@strong{If the interface is not a Bluetooth interface the helper exits with a
suitable error}
@end itemize

@c %** Same as for @node entry above
@node THE LOOP
@subsubsection THE LOOP

The helper binary uses a list where it saves all the connected neighbour
devices (@emph{neighbours.devices}) and two buffers (@emph{write_pout} and
@emph{write_std}). The first message which is send is a control message with
the device's MAC address in order to announce the peer presence to the
neighbours. Here are a short description of what happens in the main loop:

@itemize @bullet
@item Every time when it receives something from the STDIN it processes the
data and saves the message in the first buffer (@emph{write_pout}). When it has
something in the buffer, it gets the destination address from the buffer,
searches the destination address in the list (if there is no connection with
that device, it creates a new one and saves it to the list) and sends the
message.
@item Every time when it receives something on the listening socket it accepts
the connection and saves the socket on a list with the reading sockets.
@item Every time when it receives something from a reading socket it parses the
message, verifies the CRC and saves it in the @emph{write_std} buffer in order
to be sent later to the STDOUT.
@end itemize

So in the main loop we use the select function to wait until one of the file
descriptor saved in one of the two file descriptors sets used is ready to use.
The first set (@emph{rfds}) represents the reading set and it could contain the
list with the reading sockets, the STDIN file descriptor or the listening
socket. The second set (@emph{wfds}) is the writing set and it could contain
the sending socket or the STDOUT file descriptor. After the select function
returns, we check which file descriptor is ready to use and we do what is
supposed to do on that kind of event. @emph{For example:} if it is the
listening socket then we accept a new connection and save the socket in the
reading list; if it is the STDOUT file descriptor, then we write to STDOUT the
message from the @emph{write_std} buffer.

To find out on which port a device is listening on we connect to the local SDP
server and searche the registered service for that device.

@emph{You should be aware of the fact that if the device fails to connect to
another one when trying to send a message it will attempt one more time. If it
fails again, then it skips the message.}
@emph{Also you should know that the
transport Bluetooth plugin has support for @strong{broadcast messages}.}

@node Details about the broadcast implementation
@subsubsection Details about the broadcast implementation
@c %**end of header

First I want to point out that the broadcast functionality for the CONTROL
messages is not implemented in a conventional way. Since the inquiry scan time
is too big and it will take some time to send a message to all the
discoverable devices I decided to tackle the problem in a different way. Here
is how I did it:

@itemize @bullet
@item If it is the first time when I have to broadcast a message I make an
inquiry scan and save all the devices' addresses to a vector.
@item After the inquiry scan ends I take the first address from the list and I
try to connect to it. If it fails, I try to connect to the next one. If it
succeeds, I save the socket to a list and send the message to the device.
@item When I have to broadcast another message, first I search on the list for
a new device which I'm not connected to. If there is no new device on the list
I go to the beginning of the list and send the message to the old devices.
After 5 cycles I make a new inquiry scan to check out if there are new
discoverable devices and save them to the list. If there are no new
discoverable devices I reset the cycling counter and go again through the old
list and send messages to the devices saved in it.
@end itemize

@strong{Therefore}:

@itemize @bullet
@item every time when I have a broadcast message I look up on the list for a
new device and send the message to it
@item if I reached the end of the list for 5 times and I'm connected to all the
devices from the list I make a new inquiry scan. @emph{The number of the list's
cycles after an inquiry scan could be increased by redefining the MAX_LOOPS
variable}
@item when there are no new devices I send messages to the old ones.
@end itemize

Doing so, the broadcast control messages will reach the devices but with delay.

@emph{NOTICE:} When I have to send a message to a certain device first I check
on the broadcast list to see if we are connected to that device. If not we try
to connect to it and in case of success we save the address and the socket on
the list. If we are already connected to that device we simply use the socket.

@node Windows functionality
@subsubsection Windows functionality
@c %**end of header

For Windows I decided to use the Microsoft Bluetooth stack which has the
advantage of coming standard from Windows XP SP2. The main disadvantage is
that it only supports the RFCOMM protocol so we will not be able to have a low
level control over the Bluetooth device. Therefore it is the user
responsability to check if the device is up and in the discoverable mode. Also
there are no tools which could be used for debugging in order to read the data
coming from and going to a Bluetooth device, which obviously hindered my work.
Another thing that slowed down the implementation of the plugin (besides that
I wasn't too accomodated with the win32 API) was that there were some bugs on
MinGW regarding the Bluetooth. Now they are solved but you should keep in mind
that you should have the latest updates (especially the @emph{ws2bth} header).

Besides the fact that it uses the Windows Sockets, the Windows implemenation
follows the same principles as the Linux one:

@itemize @bullet
@item
It has a initalization part where it initializes the Windows Sockets, creates a
RFCOMM socket which will be binded and switched to the listening mode and
registers a SDP service.
In the Microsoft Bluetooth API there are two ways to work with the SDP:
@itemize @bullet
@item an easy way which works with very simple service records
@item a hard way which is useful when you need to update or to delete the
record
@end itemize
@end itemize

Since I only needed the SDP service to find out on which port the device is
listening on and that did not change, I decided to use the easy way. In order
to register the service I used the @emph{WSASetService} function and I
generated the @emph{Universally Unique Identifier} with the @emph{guidgen.exe}
Windows's tool.

In the loop section the only difference from the Linux implementation is that
I used the GNUNET_NETWORK library for functions like @emph{accept},
@emph{bind}, @emph{connect} or @emph{select}. I decided to use the
GNUNET_NETWORK library because I also needed to interact with the STDIN and
STDOUT handles and on Windows the select function is only defined for sockets,
and it will not work for arbitrary file handles.

Another difference between Linux and Windows implementation is that in Linux,
the Bluetooth address is represented in 48 bits while in Windows is
represented in 64 bits. Therefore I had to do some changes on
@emph{plugin_transport_wlan} header.

Also, currently on Windows the Bluetooth plugin doesn't have support for
broadcast messages. When it receives a broadcast message it will skip it.

@node Pending features
@subsubsection Pending features
@c %**end of header

@itemize @bullet
@item Implement the broadcast functionality on Windows @emph{(currently working
on)}
@item Implement a testcase for the helper :@ @emph{@ The testcase consists of a
program which emaluates the plugin and uses the helper. It will simulate
connections, disconnections and data transfers.@ }
@end itemize

If you have a new idea about a feature of the plugin or suggestions about how
I could improve the implementation you are welcome to comment or to contact
me.

@node WLAN plugin
@section WLAN plugin
@c %**end of header

This section documents how the wlan transport plugin works. Parts which are not
implemented yet or could be better implemented are described at the end.

@node The ATS Subsystem
@section The ATS Subsystem
@c %**end of header

ATS stands for "automatic transport selection", and the function of ATS in
GNUnet is to decide on which address (and thus transport plugin) should be used
for two peers to communicate, and what bandwidth limits should be imposed on
such an individual connection. To help ATS make an informed decision,
higher-level services inform the ATS service about their requirements and the
quality of the service rendered. The ATS service also interacts with the
transport service to be appraised of working addresses and to communicate its
resource allocation decisions. Finally, the ATS service's operation can be
observed using a monitoring API.

The main logic of the ATS service only collects the available addresses, their
performance characteristics and the applications requirements, but does not
make the actual allocation decision. This last critical step is left to an ATS
plugin, as we have implemented (currently three) different allocation
strategies which differ significantly in their performance and maturity, and it
is still unclear if any particular plugin is generally superior.

@node GNUnet's CORE Subsystem
@section GNUnet's CORE Subsystem
@c %**end of header

The CORE subsystem in GNUnet is responsible for securing link-layer
communications between nodes in the GNUnet overlay network. CORE builds on the
TRANSPORT subsystem which provides for the actual, insecure, unreliable
link-layer communication (for example, via UDP or WLAN), and then adds
fundamental security to the connections:

@itemize @bullet
@item confidentiality with so-called perfect forward secrecy; we use
@uref{http://en.wikipedia.org/wiki/Elliptic_curve_Diffie%E2%80%93Hellman,
ECDHE} powered by @uref{http://cr.yp.to/ecdh.html, Curve25519} for the key
exchange and then use symmetric encryption, encrypting with both
@uref{http://en.wikipedia.org/wiki/Rijndael, AES-256} and
@uref{http://en.wikipedia.org/wiki/Twofish, Twofish}
@item @uref{http://en.wikipedia.org/wiki/Authentication, authentication} is
achieved by signing the ephemeral keys using @uref{http://ed25519.cr.yp.to/,
Ed25519}, a deterministic variant of @uref{http://en.wikipedia.org/wiki/ECDSA,
ECDSA}
@item integrity protection (using @uref{http://en.wikipedia.org/wiki/SHA-2,
SHA-512} to do @uref{http://en.wikipedia.org/wiki/Authenticated_encryption,
encrypt-then-MAC)}
@item @uref{http://en.wikipedia.org/wiki/Replay_attack, replay} protection
(using nonces, timestamps, challenge-response, message counters and ephemeral
keys)
@item liveness (keep-alive messages, timeout)
@end itemize

@menu
* Limitations::
* When is a peer "connected"?::
* libgnunetcore::
* The CORE Client-Service Protocol::
* The CORE Peer-to-Peer Protocol::
@end menu

@node Limitations
@subsection Limitations
@c %**end of header

CORE does not perform @uref{http://en.wikipedia.org/wiki/Routing, routing};
using CORE it is only possible to communicate with peers that happen to
already be "directly" connected with each other. CORE also does not have an
API to allow applications to establish such "direct" connections --- for this,
applications can ask TRANSPORT, but TRANSPORT might not be able to establish a
"direct" connection. The TOPOLOGY subsystem is responsible for trying to keep
a few "direct" connections open at all times. Applications that need to talk
to particular peers should use the CADET subsystem, as it can establish
arbitrary "indirect" connections.

Because CORE does not perform routing, CORE must only be used directly by
applications that either perform their own routing logic (such as anonymous
file-sharing) or that do not require routing, for example because they are
based on flooding the network. CORE communication is unreliable and delivery
is possibly out-of-order. Applications that require reliable communication
should use the CADET service. Each application can only queue one message per
target peer with the CORE service at any time; messages cannot be larger than
approximately 63 kilobytes. If messages are small, CORE may group multiple
messages (possibly from different applications) prior to encryption. If
permitted by the application (using the @uref{http://baus.net/on-tcp_cork/,
cork} option), CORE may delay transmissions to facilitate grouping of multiple
small messages. If cork is not enabled, CORE will transmit the message as soon
as TRANSPORT allows it (TRANSPORT is responsible for limiting bandwidth and
congestion control). CORE does not allow flow control; applications are
expected to process messages at line-speed. If flow control is needed,
applications should use the CADET service.

@node When is a peer "connected"?
@subsection When is a peer "connected"?
@c %**end of header

In addition to the security features mentioned above, CORE also provides one
additional key feature to applications using it, and that is a limited form of
protocol-compatibility checking. CORE distinguishes between TRANSPORT-level
connections (which enable communication with other peers) and
application-level connections. Applications using the CORE API will
(typically) learn about application-level connections from CORE, and not about
TRANSPORT-level connections. When a typical application uses CORE, it will
specify a set of message types (from @code{gnunet_protocols.h}) that it
understands. CORE will then notify the application about connections it has
with other peers if and only if those applications registered an intersecting
set of message types with their CORE service. Thus, it is quite possible that
CORE only exposes a subset of the established direct connections to a
particular application --- and different applications running above CORE might
see different sets of connections at the same time.

A special case are applications that do not register a handler for any message
type. CORE assumes that these applications merely want to monitor connections
(or "all" messages via other callbacks) and will notify those applications
about all connections. This is used, for example, by the @code{gnunet-core}
command-line tool to display the active connections. Note that it is also
possible that the TRANSPORT service has more active connections than the CORE
service, as the CORE service first has to perform a key exchange with
connecting peers before exchanging information about supported message types
and notifying applications about the new connection.

@node libgnunetcore
@subsection libgnunetcore
@c %**end of header

The CORE API (defined in @code{gnunet_core_service.h}) is the basic messaging
API used by P2P applications built using GNUnet. It provides applications the
ability to send and receive encrypted messages to the peer's "directly"
connected neighbours.

As CORE connections are generally "direct" connections,@ applications must not
assume that they can connect to arbitrary peers this way, as "direct"
connections may not always be possible. Applications using CORE are notified
about which peers are connected. Creating new "direct" connections must be
done using the TRANSPORT API.

The CORE API provides unreliable, out-of-order delivery. While the
implementation tries to ensure timely, in-order delivery, both message losses
and reordering are not detected and must be tolerated by the application. Most
important, the core will NOT perform retransmission if messages could not be
delivered.

Note that CORE allows applications to queue one message per connected peer.
The rate at which each connection operates is influenced by the preferences
expressed by local application as well as restrictions imposed by the other
peer. Local applications can express their preferences for particular
connections using the "performance" API of the ATS service.

Applications that require more sophisticated transmission capabilities such as
TCP-like behavior, or if you intend to send messages to arbitrary remote
peers, should use the CADET API.

The typical use of the CORE API is to connect to the CORE service using
@code{GNUNET_CORE_connect}, process events from the CORE service (such as
peers connecting, peers disconnecting and incoming messages) and send messages
to connected peers using @code{GNUNET_CORE_notify_transmit_ready}. Note that
applications must cancel pending transmission requests if they receive a
disconnect event for a peer that had a transmission pending; furthermore,
queueing more than one transmission request per peer per application using the
service is not permitted.

The CORE API also allows applications to monitor all communications of the
peer prior to encryption (for outgoing messages) or after decryption (for
incoming messages). This can be useful for debugging, diagnostics or to
establish the presence of cover traffic (for anonymity). As monitoring
applications are often not interested in the payload, the monitoring callbacks
can be configured to only provide the message headers (including the message
type and size) instead of copying the full data stream to the monitoring
client.

The init callback of the @code{GNUNET_CORE_connect} function is called with
the hash of the public key of the peer. This public key is used to identify
the peer globally in the GNUnet network. Applications are encouraged to check
that the provided hash matches the hash that they are using (as theoretically
the application may be using a different configuration file with a different
private key, which would result in hard to find bugs).

As with most service APIs, the CORE API isolates applications from crashes of
the CORE service. If the CORE service crashes, the application will see
disconnect events for all existing connections. Once the connections are
re-established, the applications will be receive matching connect events.

@node The CORE Client-Service Protocol
@subsection The CORE Client-Service Protocol
@c %**end of header

This section describes the protocol between an application using the CORE
service (the client) and the CORE service process itself.


@menu
* Setup2::
* Notifications::
* Sending::
@end menu

@node Setup2
@subsubsection Setup2
@c %**end of header

When a client connects to the CORE service, it first sends a
@code{InitMessage} which specifies options for the connection and a set of
message type values which are supported by the application. The options
bitmask specifies which events the client would like to be notified about. The
options include:

@table @asis
@item GNUNET_CORE_OPTION_NOTHING No notifications
@item GNUNET_CORE_OPTION_STATUS_CHANGE Peers connecting and disconnecting
@item GNUNET_CORE_OPTION_FULL_INBOUND All inbound messages (after decryption) with
full payload
@item GNUNET_CORE_OPTION_HDR_INBOUND Just the @code{MessageHeader}
of all inbound messages
@item GNUNET_CORE_OPTION_FULL_OUTBOUND All outbound
messages (prior to encryption) with full payload
@item GNUNET_CORE_OPTION_HDR_OUTBOUND Just the @code{MessageHeader} of all outbound
messages
@end table

Typical applications will only monitor for connection status changes.

The CORE service responds to the @code{InitMessage} with an
@code{InitReplyMessage} which contains the peer's identity. Afterwards, both
CORE and the client can send messages.

@node Notifications
@subsubsection Notifications
@c %**end of header

The CORE will send @code{ConnectNotifyMessage}s and
@code{DisconnectNotifyMessage}s whenever peers connect or disconnect from the
CORE (assuming their type maps overlap with the message types registered by
the client). When the CORE receives a message that matches the set of message
types specified during the @code{InitMessage} (or if monitoring is enabled in
for inbound messages in the options), it sends a @code{NotifyTrafficMessage}
with the peer identity of the sender and the decrypted payload. The same
message format (except with @code{GNUNET_MESSAGE_TYPE_CORE_NOTIFY_OUTBOUND}
for the message type) is used to notify clients monitoring outbound messages;
here, the peer identity given is that of the receiver.

@node Sending
@subsubsection Sending
@c %**end of header

When a client wants to transmit a message, it first requests a transmission
slot by sending a @code{SendMessageRequest} which specifies the priority,
deadline and size of the message. Note that these values may be ignored by
CORE. When CORE is ready for the message, it answers with a
@code{SendMessageReady} response. The client can then transmit the payload
with a @code{SendMessage} message. Note that the actual message size in the
@code{SendMessage} is allowed to be smaller than the size in the original
request. A client may at any time send a fresh @code{SendMessageRequest},
which then superceeds the previous @code{SendMessageRequest}, which is then no
longer valid. The client can tell which @code{SendMessageRequest} the CORE
service's @code{SendMessageReady} message is for as all of these messages
contain a "unique" request ID (based on a counter incremented by the client
for each request).

@node The CORE Peer-to-Peer Protocol
@subsection The CORE Peer-to-Peer Protocol
@c %**end of header


@menu
* Creating the EphemeralKeyMessage::
* Establishing a connection::
* Encryption and Decryption::
* Type maps::
@end menu

@node Creating the EphemeralKeyMessage
@subsubsection Creating the EphemeralKeyMessage
@c %**end of header

When the CORE service starts, each peer creates a fresh ephemeral (ECC)
public-private key pair and signs the corresponding @code{EphemeralKeyMessage}
with its long-term key (which we usually call the peer's identity; the hash of
the public long term key is what results in a @code{struct
GNUNET_PeerIdentity} in all GNUnet APIs. The ephemeral key is ONLY used for an
@uref{http://en.wikipedia.org/wiki/Elliptic_curve_Diffie%E2%80%93Hellman,
ECDHE} exchange by the CORE service to establish symmetric session keys. A
peer will use the same @code{EphemeralKeyMessage} for all peers for
@code{REKEY_FREQUENCY}, which is usually 12 hours. After that time, it will
create a fresh ephemeral key (forgetting the old one) and broadcast the new
@code{EphemeralKeyMessage} to all connected peers, resulting in fresh
symmetric session keys. Note that peers independently decide on when to
discard ephemeral keys; it is not a protocol violation to discard keys more
often. Ephemeral keys are also never stored to disk; restarting a peer will
thus always create a fresh ephemeral key. The use of ephemeral keys is what
provides @uref{http://en.wikipedia.org/wiki/Forward_secrecy, forward secrecy}.

Just before transmission, the @code{EphemeralKeyMessage} is patched to reflect
the current sender_status, which specifies the current state of the connection
from the point of view of the sender. The possible values are:

@table @asis
@item KX_STATE_DOWN Initial value, never used on the network
@item KX_STATE_KEY_SENT We sent our ephemeral key, do not know the key of the other
peer
@item KX_STATE_KEY_RECEIVED This peer has received a valid ephemeral key
of the other peer, but we are waiting for the other peer to confirm it's
authenticity (ability to decode) via challenge-response.
@item KX_STATE_UP The
connection is fully up from the point of view of the sender (now performing
keep-alives)
@item KX_STATE_REKEY_SENT The sender has initiated a rekeying
operation; the other peer has so far failed to confirm a working connection
using the new ephemeral key
@end table

@node Establishing a connection
@subsubsection Establishing a connection
@c %**end of header

Peers begin their interaction by sending a @code{EphemeralKeyMessage} to the
other peer once the TRANSPORT service notifies the CORE service about the
connection. A peer receiving an @code{EphemeralKeyMessage} with a status
indicating that the sender does not have the receiver's ephemeral key, the
receiver's @code{EphemeralKeyMessage} is sent in response.@ Additionally, if
the receiver has not yet confirmed the authenticity of the sender, it also
sends an (encrypted)@code{PingMessage} with a challenge (and the identity of
the target) to the other peer. Peers receiving a @code{PingMessage} respond
with an (encrypted) @code{PongMessage} which includes the challenge. Peers
receiving a @code{PongMessage} check the challenge, and if it matches set the
connection to @code{KX_STATE_UP}.

@node Encryption and Decryption
@subsubsection Encryption and Decryption
@c %**end of header

All functions related to the key exchange and encryption/decryption of
messages can be found in @code{gnunet-service-core_kx.c} (except for the
cryptographic primitives, which are in @code{util/crypto*.c}).@ Given the key
material from ECDHE, a
@uref{http://en.wikipedia.org/wiki/Key_derivation_function, Key derivation
function} is used to derive two pairs of encryption and decryption keys for
AES-256 and TwoFish, as well as initialization vectors and authentication keys
(for @uref{http://en.wikipedia.org/wiki/HMAC, HMAC}). The HMAC is computed
over the encrypted payload. Encrypted messages include an iv_seed and the HMAC
in the header.

Each encrypted message in the CORE service includes a sequence number and a
timestamp in the encrypted payload. The CORE service remembers the largest
observed sequence number and a bit-mask which represents which of the previous
32 sequence numbers were already used. Messages with sequence numbers lower
than the largest observed sequence number minus 32 are discarded. Messages
with a timestamp that is less than @code{REKEY_TOLERANCE} off (5 minutes) are
also discarded. This of course means that system clocks need to be reasonably
synchronized for peers to be able to communicate. Additionally, as the
ephemeral key changes every 12h, a peer would not even be able to decrypt
messages older than 12h.

@node Type maps
@subsubsection Type maps
@c %**end of header

Once an encrypted connection has been established, peers begin to exchange
type maps. Type maps are used to allow the CORE service to determine which
(encrypted) connections should be shown to which applications. A type map is
an array of 65536 bits representing the different types of messages understood
by applications using the CORE service. Each CORE service maintains this map,
simply by setting the respective bit for each message type supported by any of
the applications using the CORE service. Note that bits for message types
embedded in higher-level protocols (such as MESH) will not be included in
these type maps.

Typically, the type map of a peer will be sparse. Thus, the CORE service
attempts to compress its type map using @code{gzip}-style compression
("deflate") prior to transmission. However, if the compression fails to
compact the map, the map may also be transmitted without compression
(resulting in @code{GNUNET_MESSAGE_TYPE_CORE_COMPRESSED_TYPE_MAP} or
@code{GNUNET_MESSAGE_TYPE_CORE_BINARY_TYPE_MAP} messages respectively). Upon
receiving a type map, the respective CORE service notifies applications about
the connection to the other peer if they support any message type indicated in
the type map (or no message type at all). If the CORE service experience a
connect or disconnect event from an application, it updates its type map
(setting or unsetting the respective bits) and notifies its neighbours about
the change. The CORE services of the neighbours then in turn generate connect
and disconnect events for the peer that sent the type map for their respective
applications. As CORE messages may be lost, the CORE service confirms
receiving a type map by sending back a
@code{GNUNET_MESSAGE_TYPE_CORE_CONFIRM_TYPE_MAP}. If such a confirmation (with
the correct hash of the type map) is not received, the sender will retransmit
the type map (with exponential back-off).

@node GNUnet's CADET subsystem
@section GNUnet's CADET subsystem

The CADET subsystem in GNUnet is responsible for secure end-to-end
communications between nodes in the GNUnet overlay network. CADET builds on the
CORE subsystem which provides for the link-layer communication and then adds
routing, forwarding and additional security to the connections. CADET offers
the same cryptographic services as CORE, but on an end-to-end level. This is
done so peers retransmitting traffic on behalf of other peers cannot access the
payload data.

@itemize @bullet
@item CADET provides confidentiality with so-called perfect forward secrecy; we
use ECDHE powered by Curve25519 for the key exchange and then use symmetric
encryption, encrypting with both AES-256 and Twofish
@item authentication is achieved by signing the ephemeral keys using Ed25519, a
deterministic variant of ECDSA
@item integrity protection (using SHA-512 to do encrypt-then-MAC, although only
256 bits are sent to reduce overhead)
@item replay protection (using nonces, timestamps, challenge-response, message
counters and ephemeral keys)
@item liveness (keep-alive messages, timeout)
@end itemize

Additional to the CORE-like security benefits, CADET offers other properties
that make it a more universal service than CORE.

@itemize @bullet
@item CADET can establish channels to arbitrary peers in GNUnet. If a peer is
not immediately reachable, CADET will find a path through the network and ask
other peers to retransmit the traffic on its behalf.
@item CADET offers (optional) reliability mechanisms. In a reliable channel
traffic is guaranteed to arrive complete, unchanged and in-order.
@item CADET takes care of flow and congestion control mechanisms, not allowing
the sender to send more traffic than the receiver or the network are able to
process.
@end itemize

@menu
* libgnunetcadet::
@end menu

@node libgnunetcadet
@subsection libgnunetcadet


The CADET API (defined in gnunet_cadet_service.h) is the messaging API used by
P2P applications built using GNUnet. It provides applications the ability to
send and receive encrypted messages to any peer participating in GNUnet. The
API is heavily base on the CORE API.

CADET delivers messages to other peers in "channels". A channel is a permanent
connection defined by a destination peer (identified by its public key) and a
port number. Internally, CADET tunnels all channels towards a destiantion peer
using one session key and relays the data on multiple "connections",
independent from the channels.

Each channel has optional paramenters, the most important being the reliability
flag. Should a message get lost on TRANSPORT/CORE level, if a channel is
created with as reliable, CADET will retransmit the lost message and deliver it
in order to the destination application.

To communicate with other peers using CADET, it is necessary to first connect
to the service using @code{GNUNET_CADET_connect}. This function takes several
parameters in form of callbacks, to allow the client to react to various
events, like incoming channels or channels that terminate, as well as specify a
list of ports the client wishes to listen to (at the moment it is not possible
to start listening on further ports once connected, but nothing prevents a
client to connect several times to CADET, even do one connection per listening
port). The function returns a handle which has to be used for any further
interaction with the service.

To connect to a remote peer a client has to call the
@code{GNUNET_CADET_channel_create} function. The most important parameters
given are the remote peer's identity (it public key) and a port, which
specifies which application on the remote peer to connect to, similar to
TCP/UDP ports. CADET will then find the peer in the GNUnet network and
establish the proper low-level connections and do the necessary key exchanges
to assure and authenticated, secure and verified communication. Similar to
@code{GNUNET_CADET_connect},@code{GNUNET_CADET_create_channel} returns a handle
to interact with the created channel.

For every message the client wants to send to the remote application,
@code{GNUNET_CADET_notify_transmit_ready} must be called, indicating the
channel on which the message should be sent and the size of the message (but
not the message itself!). Once CADET is ready to send the message, the provided
callback will fire, and the message contents are provided to this callback.

Please note the CADET does not provide an explicit notification of when a
channel is connected. In loosely connected networks, like big wireless mesh
networks, this can take several seconds, even minutes in the worst case. To be
alerted when a channel is online, a client can call
@code{GNUNET_CADET_notify_transmit_ready} immediately after
@code{GNUNET_CADET_create_channel}. When the callback is activated, it means
that the channel is online. The callback can give 0 bytes to CADET if no
message is to be sent, this is ok.

If a transmission was requested but before the callback fires it is no longer
needed, it can be cancelled with
@code{GNUNET_CADET_notify_transmit_ready_cancel}, which uses the handle given
back by @code{GNUNET_CADET_notify_transmit_ready}. As in the case of CORE, only
one message can be requested at a time: a client must not call
@code{GNUNET_CADET_notify_transmit_ready} again until the callback is called or
the request is cancelled.

When a channel is no longer needed, a client can call
@code{GNUNET_CADET_channel_destroy} to get rid of it. Note that CADET will try
to transmit all pending traffic before notifying the remote peer of the
destruction of the channel, including retransmitting lost messages if the
channel was reliable.

Incoming channels, channels being closed by the remote peer, and traffic on any
incoming or outgoing channels are given to the client when CADET executes the
callbacks given to it at the time of @code{GNUNET_CADET_connect}.

Finally, when an application no longer wants to use CADET, it should call
@code{GNUNET_CADET_disconnect}, but first all channels and pending
transmissions must be closed (otherwise CADET will complain).

@node GNUnet's NSE subsystem
@section GNUnet's NSE subsystem


NSE stands for Network Size Estimation. The NSE subsystem provides other
subsystems and users with a rough estimate of the number of peers currently
participating in the GNUnet overlay. The computed value is not a precise number
as producing a precise number in a decentralized, efficient and secure way is
impossible. While NSE's estimate is inherently imprecise, NSE also gives the
expected range. For a peer that has been running in a stable network for a
while, the real network size will typically (99.7% of the time) be in the range
of [2/3 estimate, 3/2 estimate]. We will now give an overview of the algorithm
used to calcualte the estimate; all of the details can be found in this
technical report.

@menu
* Motivation::
* Principle::
* libgnunetnse::
* The NSE Client-Service Protocol::
* The NSE Peer-to-Peer Protocol::
@end menu

@node Motivation
@subsection Motivation


Some subsytems, like DHT, need to know the size of the GNUnet network to
optimize some parameters of their own protocol. The decentralized nature of
GNUnet makes efficient and securely counting the exact number of peers
infeasable. Although there are several decentralized algorithms to count the
number of peers in a system, so far there is none to do so securely. Other
protocols may allow any malicious peer to manipulate the final result or to
take advantage of the system to perform DoS (Denial of Service) attacks against
the network. GNUnet's NSE protocol avoids these drawbacks.



@menu
* Security::
@end menu

@node Security
@subsubsection Security


The NSE subsystem is designed to be resilient against these attacks. It uses
@uref{http://en.wikipedia.org/wiki/Proof-of-work_system, proofs of work} to
prevent one peer from impersonating a large number of participants, which would
otherwise allow an adversary to artifically inflate the estimate. The DoS
protection comes from the time-based nature of the protocol: the estimates are
calculated periodically and out-of-time traffic is either ignored or stored for
later retransmission by benign peers. In particular, peers cannot trigger
global network communication at will.

@node Principle
@subsection Principle


The algorithm calculates the estimate by finding the globally closest peer ID
to a random, time-based value.

The idea is that the closer the ID is to the random value, the more "densely
packed" the ID space is, and therefore, more peers are in the network.



@menu
* Example::
* Algorithm::
* Target value::
* Timing::
* Controlled Flooding::
* Calculating the estimate::
@end menu

@node Example
@subsubsection Example


Suppose all peers have IDs between 0 and 100 (our ID space), and the random
value is 42. If the closest peer has the ID 70 we can imagine that the average
"distance" between peers is around 30 and therefore the are around 3 peers in
the whole ID space. On the other hand, if the closest peer has the ID 44, we
can imagine that the space is rather packed with peers, maybe as much as 50 of
them. Naturally, we could have been rather unlucky, and there is only one peer
and happens to have the ID 44. Thus, the current estimate is calculated as the
average over multiple rounds, and not just a single sample.

@node Algorithm
@subsubsection Algorithm


Given that example, one can imagine that the job of the subsystem is to
efficiently communicate the ID of the closest peer to the target value to all
the other peers, who will calculate the estimate from it.

@node Target value
@subsubsection Target value

@c %**end of header

The target value itself is generated by hashing the current time, rounded down
to an agreed value. If the rounding amount is 1h (default) and the time is
12:34:56, the time to hash would be 12:00:00. The process is repeated each
rouning amount (in this example would be every hour). Every repetition is
called a round.

@node Timing
@subsubsection Timing
@c %**end of header

The NSE subsystem has some timing control to avoid everybody broadcasting its
ID all at one. Once each peer has the target random value, it compares its own
ID to the target and calculates the hypothetical size of the network if that
peer were to be the closest. Then it compares the hypothetical size with the
estimate from the previous rounds. For each value there is an assiciated point
in the period, let's call it "broadcast time". If its own hypothetical estimate
is the same as the previous global estimate, its "broadcast time" will be in
the middle of the round. If its bigger it will be earlier and if its smaler
(the most likely case) it will be later. This ensures that the peers closests
to the target value start broadcasting their ID the first.

@node Controlled Flooding
@subsubsection Controlled Flooding

@c %**end of header

When a peer receives a value, first it verifies that it is closer than the
closest value it had so far, otherwise it answers the incoming message with a
message containing the better value. Then it checks a proof of work that must
be included in the incoming message, to ensure that the other peer's ID is not
made up (otherwise a malicious peer could claim to have an ID of exactly the
target value every round). Once validated, it compares the brodcast time of the
received value with the current time and if it's not too early, sends the
received value to its neighbors. Otherwise it stores the value until the
correct broadcast time comes. This prevents unnecessary traffic of sub-optimal
values, since a better value can come before the broadcast time, rendering the
previous one obsolete and saving the traffic that would have been used to
broadcast it to the neighbors.

@node Calculating the estimate
@subsubsection Calculating the estimate

@c %**end of header

Once the closest ID has been spread across the network each peer gets the exact
distance betweed this ID and the target value of the round and calculates the
estimate with a mathematical formula described in the tech report. The estimate
generated with this method for a single round is not very precise. Remember the
case of the example, where the only peer is the ID 44 and we happen to generate
the target value 42, thinking there are 50 peers in the network. Therefore, the
NSE subsystem remembers the last 64 estimates and calculates an average over
them, giving a result of which usually has one bit of uncertainty (the real
size could be half of the estimate or twice as much). Note that the actual
network size is calculated in powers of two of the raw input, thus one bit of
uncertainty means a factor of two in the size estimate.

@node libgnunetnse
@subsection libgnunetnse

@c %**end of header

The NSE subsystem has the simplest API of all services, with only two calls:
@code{GNUNET_NSE_connect} and @code{GNUNET_NSE_disconnect}.

The connect call gets a callback function as a parameter and this function is
called each time the network agrees on an estimate. This usually is once per
round, with some exceptions: if the closest peer has a late local clock and
starts spreading his ID after everyone else agreed on a value, the callback
might be activated twice in a round, the second value being always bigger than
the first. The default round time is set to 1 hour.

The disconnect call disconnects from the NSE subsystem and the callback is no
longer called with new estimates.



@menu
* Results::
* Examples2::
@end menu

@node Results
@subsubsection Results

@c %**end of header

The callback provides two values: the average and the
@uref{http://en.wikipedia.org/wiki/Standard_deviation, standard deviation} of
the last 64 rounds. The values provided by the callback function are
logarithmic, this means that the real estimate numbers can be obtained by
calculating 2 to the power of the given value (2average). From a statistics
point of view this means that:

@itemize @bullet
@item 68% of the time the real size is included in the interval
[(2average-stddev), 2]
@item 95% of the time the real size is included in the interval
[(2average-2*stddev, 2^average+2*stddev]
@item 99.7% of the time the real size is included in the interval
[(2average-3*stddev, 2average+3*stddev]
@end itemize

The expected standard variation for 64 rounds in a network of stable size is
0.2. Thus, we can say that normally:

@itemize @bullet
@item 68% of the time the real size is in the range [-13%, +15%]
@item 95% of the time the real size is in the range [-24%, +32%]
@item 99.7% of the time the real size is in the range [-34%, +52%]
@end itemize

As said in the introduction, we can be quite sure that usually the real size is
between one third and three times the estimate. This can of course vary with
network conditions. Thus, applications may want to also consider the provided
standard deviation value, not only the average (in particular, if the standard
veriation is very high, the average maybe meaningless: the network size is
changing rapidly).

@node Examples2
@subsubsection Examples2

@c %**end of header

Let's close with a couple examples.

@table @asis

@item Average: 10, std dev: 1 Here the estimate would be 2^10 = 1024 peers.@
The range in which we can be 95% sure is: [2^8, 2^12] = [256, 4096]. We can be
very (>99.7%) sure that the network is not a hundred peers and absolutely sure
that it is not a million peers, but somewhere around a thousand.

@item Average 22, std dev: 0.2 Here the estimate would be 2^22 = 4 Million peers.@
The range in which we can be 99.7% sure is: [2^21.4, 2^22.6] = [2.8M, 6.3M].
We can be sure that the network size is around four million, with absolutely
way of it being 1 million.

@end table

To put this in perspective, if someone remembers the LHC Higgs boson results,
were announced with "5 sigma" and "6 sigma" certainties. In this case a 5 sigma
minimum would be 2 million and a 6 sigma minimum, 1.8 million.

@node The NSE Client-Service Protocol
@subsection The NSE Client-Service Protocol

@c %**end of header

As with the API, the client-service protocol is very simple, only has 2
different messages, defined in @code{src/nse/nse.h}:

@itemize @bullet
@item @code{GNUNET_MESSAGE_TYPE_NSE_START}@ This message has no parameters and
is sent from the client to the service upon connection.
@item @code{GNUNET_MESSAGE_TYPE_NSE_ESTIMATE}@ This message is sent from the
service to the client for every new estimate and upon connection. Contains a
timestamp for the estimate, the average and the standard deviation for the
respective round.
@end itemize

When the @code{GNUNET_NSE_disconnect} API call is executed, the client simply
disconnects from the service, with no message involved.

@node The NSE Peer-to-Peer Protocol
@subsection The NSE Peer-to-Peer Protocol

@c %**end of header

The NSE subsystem only has one message in the P2P protocol, the
@code{GNUNET_MESSAGE_TYPE_NSE_P2P_FLOOD} message.

This message key contents are the timestamp to identify the round (differences
in system clocks may cause some peers to send messages way too early or way too
late, so the timestamp allows other peers to identify such messages easily),
the @uref{http://en.wikipedia.org/wiki/Proof-of-work_system, proof of work}
used to make it difficult to mount a
@uref{http://en.wikipedia.org/wiki/Sybil_attack, Sybil attack}, and the public
key, which is used to verify the signature on the message.

Every peer stores a message for the previous, current and next round. The
messages for the previous and current round are given to peers that connect to
us. The message for the next round is simply stored until our system clock
advances to the next round. The message for the current round is what we are
flooding the network with right now. At the beginning of each round the peer
does the following:

@itemize @bullet
@item calculates his own distance to the target value
@item creates, signs and stores the message for the current round (unless it
has a better message in the "next round" slot which came early in the previous
round)
@item calculates, based on the stored round message (own or received) when to
stard flooding it to its neighbors
@end itemize

Upon receiving a message the peer checks the validity of the message (round,
proof of work, signature). The next action depends on the contents of the
incoming message:

@itemize @bullet
@item if the message is worse than the current stored message, the peer sends
the current message back immediately, to stop the other peer from spreading
suboptimal results
@item if the message is better than the current stored message, the peer stores
the new message and calculates the new target time to start spreading it to its
neighbors (excluding the one the message came from)
@item if the message is for the previous round, it is compared to the message
stored in the "previous round slot", which may then be updated
@item if the message is for the next round, it is compared to the message
stored in the "next round slot", which again may then be updated
@end itemize

Finally, when it comes to send the stored message for the current round to the
neighbors there is a random delay added for each neighbor, to avoid traffic
spikes and minimize cross-messages.

@node GNUnet's HOSTLIST subsystem
@section GNUnet's HOSTLIST subsystem

@c %**end of header

Peers in the GNUnet overlay network need address information so that they can
connect with other peers. GNUnet uses so called HELLO messages to store and
exchange peer addresses. GNUnet provides several methods for peers to obtain
this information:

@itemize @bullet
@item out-of-band exchange of HELLO messages (manually, using for example
gnunet-peerinfo)
@item HELLO messages shipped with GNUnet (automatic with distribution)
@item UDP neighbor discovery in LAN (IPv4 broadcast, IPv6 multicast)
@item topology gossiping (learning from other peers we already connected to),
and
@item the HOSTLIST daemon covered in this section, which is particularly
relevant for bootstrapping new peers.
@end itemize

New peers have no existing connections (and thus cannot learn from gossip among
peers), may not have other peers in their LAN and might be started with an
outdated set of HELLO messages from the distribution. In this case, getting new
peers to connect to the network requires either manual effort or the use of a
HOSTLIST to obtain HELLOs.

@menu
* HELLOs::
* Overview for the HOSTLIST subsystem::
* Interacting with the HOSTLIST daemon::
* Hostlist security address validation::
* The HOSTLIST daemon::
* The HOSTLIST server::
* The HOSTLIST client::
* Usage::
@end menu

@node HELLOs
@subsection HELLOs

@c %**end of header

The basic information peers require to connect to other peers are contained in
so called HELLO messages you can think of as a business card. Besides the
identity of the peer (based on the cryptographic public key) a HELLO message
may contain address information that specifies ways to contact a peer. By
obtaining HELLO messages, a peer can learn how to contact other peers.

@node Overview for the HOSTLIST subsystem
@subsection Overview for the HOSTLIST subsystem

@c %**end of header

The HOSTLIST subsystem provides a way to distribute and obtain contact
information to connect to other peers using a simple HTTP GET request. It's
implementation is split in three parts, the main file for the daemon itself
(gnunet-daemon-hostlist.c), the HTTP client used to download peer information
(hostlist-client.c) and the server component used to provide this information
to other peers (hostlist-server.c). The server is basically a small HTTP web
server (based on GNU libmicrohttpd) which provides a list of HELLOs known to
the local peer for download. The client component is basically a HTTP client
(based on libcurl) which can download hostlists from one or more websites. The
hostlist format is a binary blob containing a sequence of HELLO messages. Note
that any HTTP server can theoretically serve a hostlist, the build-in hostlist
server makes it simply convenient to offer this service.


@menu
* Features::
* Limitations2::
@end menu

@node Features
@subsubsection Features

@c %**end of header

The HOSTLIST daemon can:

@itemize @bullet
@item provide HELLO messages with validated addresses obtained from PEERINFO to
download for other peers
@item download HELLO messages and forward these message to the TRANSPORT
subsystem for validation
@item advertises the URL of this peer's hostlist address to other peers via
gossip
@item automatically learn about hostlist servers from the gossip of other peers
@end itemize

@node Limitations2
@subsubsection Limitations2

@c %**end of header

The HOSTLIST daemon does not:

@itemize @bullet
@item verify the cryptographic information in the HELLO messages
@item verify the address information in the HELLO messages
@end itemize

@node Interacting with the HOSTLIST daemon
@subsection Interacting with the HOSTLIST daemon

@c %**end of header

The HOSTLIST subsystem is currently implemented as a daemon, so there is no
need for the user to interact with it and therefore there is no command line
tool and no API to communicate with the daemon. In the future, we can envision
changing this to allow users to manually trigger the download of a hostlist.

Since there is no command line interface to interact with HOSTLIST, the only
way to interact with the hostlist is to use STATISTICS to obtain or modify
information about the status of HOSTLIST:
@example
$ gnunet-statistics -s hostlist
@end example

In particular, HOSTLIST includes a @strong{persistent} value in statistics that
specifies when the hostlist server might be queried next. As this value is
exponentially increasing during runtime, developers may want to reset or
manually adjust it. Note that HOSTLIST (but not STATISTICS) needs to be
shutdown if changes to this value are to have any effect on the daemon (as
HOSTLIST does not monitor STATISTICS for changes to the download
frequency).

@node Hostlist security address validation
@subsection Hostlist security address validation

@c %**end of header

Since information obtained from other parties cannot be trusted without
validation, we have to distinguish between @emph{validated} and @emph{not
validated} addresses. Before using (and so trusting) information from other
parties, this information has to be double-checked (validated). Address
validation is not done by HOSTLIST but by the TRANSPORT service.

The HOSTLIST component is functionally located between the PEERINFO and the
TRANSPORT subsystem. When acting as a server, the daemon obtains valid
(@emph{validated}) peer information (HELLO messages) from the PEERINFO service
and provides it to other peers. When acting as a client, it contacts the
HOSTLIST servers specified in the configuration, downloads the (unvalidated)
list of HELLO messages and forwards these information to the TRANSPORT server
to validate the addresses.

@node The HOSTLIST daemon
@subsection The HOSTLIST daemon

@c %**end of header

The hostlist daemon is the main component of the HOSTLIST subsystem. It is
started by the ARM service and (if configured) starts the HOSTLIST client and
server components.

If the daemon provides a hostlist itself it can advertise it's own hostlist to
other peers. To do so it sends a GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT
message to other peers when they connect to this peer on the CORE level. This
hostlist advertisement message contains the URL to access the HOSTLIST HTTP
server of the sender. The daemon may also subscribe to this type of message
from CORE service, and then forward these kind of message to the HOSTLIST
client. The client then uses all available URLs to download peer information
when necessary.

When starting, the HOSTLIST daemon first connects to the CORE subsystem and if
hostlist learning is enabled, registers a CORE handler to receive this kind of
messages. Next it starts (if configured) the client and server. It passes
pointers to CORE connect and disconnect and receive handlers where the client
and server store their functions, so the daemon can notify them about CORE
events.

To clean up on shutdown, the daemon has a cleaning task, shutting down all
subsystems and disconnecting from CORE.

@node The HOSTLIST server
@subsection The HOSTLIST server

@c %**end of header

The server provides a way for other peers to obtain HELLOs. Basically it is a
small web server other peers can connect to and download a list of HELLOs using
standard HTTP; it may also advertise the URL of the hostlist to other peers
connecting on CORE level.


@menu
* The HTTP Server::
* Advertising the URL::
@end menu

@node The HTTP Server
@subsubsection The HTTP Server

@c %**end of header

During startup, the server starts a web server listening on the port specified
with the HTTPPORT value (default 8080). In addition it connects to the PEERINFO
service to obtain peer information. The HOSTLIST server uses the
GNUNET_PEERINFO_iterate function to request HELLO information for all peers and
adds their information to a new hostlist if they are suitable (expired
addresses and HELLOs without addresses are both not suitable) and the maximum
size for a hostlist is not exceeded (MAX_BYTES_PER_HOSTLISTS = 500000). When
PEERINFO finishes (with a last NULL callback), the server destroys the previous
hostlist response available for download on the web server and replaces it with
the updated hostlist. The hostlist format is basically a sequence of HELLO
messages (as obtained from PEERINFO) without any special tokenization. Since
each HELLO message contains a size field, the response can easily be split into
separate HELLO messages by the client.

A HOSTLIST client connecting to the HOSTLIST server will receive the hostlist
as a HTTP response and the the server will terminate the connection with the
result code HTTP 200 OK. The connection will be closed immediately if no
hostlist is available.

@node Advertising the URL
@subsubsection Advertising the URL

@c %**end of header

The server also advertises the URL to download the hostlist to other peers if
hostlist advertisement is enabled. When a new peer connects and has hostlist
learning enabled, the server sends a GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT
message to this peer using the CORE service.

@node The HOSTLIST client
@subsection The HOSTLIST client

@c %**end of header

The client provides the functionality to download the list of HELLOs from a set
of URLs. It performs a standard HTTP request to the URLs configured and learned
from advertisement messages received from other peers. When a HELLO is
downloaded, the HOSTLIST client forwards the HELLO to the TRANSPORT service for
validation.

The client supports two modes of operation: download of HELLOs (bootstrapping)
and learning of URLs.


@menu
* Bootstrapping::
* Learning::
@end menu

@node Bootstrapping
@subsubsection Bootstrapping

@c %**end of header

For bootstrapping, it schedules a task to download the hostlist from the set of
known URLs. The downloads are only performed if the number of current
connections is smaller than a minimum number of connections (at the moment 4).
The interval between downloads increases exponentially; however, the
exponential growth is limited if it becomes longer than an hour. At that point,
the frequency growth is capped at (#number of connections * 1h).

Once the decision has been taken to download HELLOs, the daemon chooses a
random URL from the list of known URLs. URLs can be configured in the
configuration or be learned from advertisement messages. The client uses a HTTP
client library (libcurl) to initiate the download using the libcurl multi
interface. Libcurl passes the data to the callback_download function which
stores the data in a buffer if space is available and the maximum size for a
hostlist download is not exceeded (MAX_BYTES_PER_HOSTLISTS = 500000). When a
full HELLO was downloaded, the HOSTLIST client offers this HELLO message to the
TRANSPORT service for validation. When the download is finished or failed,
statistical information about the quality of this URL is updated.

@node Learning
@subsubsection Learning

@c %**end of header

The client also manages hostlist advertisements from other peers. The HOSTLIST
daemon forwards GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT messages to the
client subsystem, which extracts the URL from the message. Next, a test of the
newly obtained URL is performed by triggering a download from the new URL. If
the URL works correctly, it is added to the list of working URLs.

The size of the list of URLs is restricted, so if an additional server is added
and the list is full, the URL with the worst quality ranking (determined
through successful downloads and number of HELLOs e.g.) is discarded. During
shutdown the list of URLs is saved to a file for persistance and loaded on
startup. URLs from the configuration file are never discarded.

@node Usage
@subsection Usage

@c %**end of header

To start HOSTLIST by default, it has to be added to the DEFAULTSERVICES section
for the ARM services. This is done in the default configuration.

For more information on how to configure the HOSTLIST subsystem see the
installation handbook:@ Configuring the hostlist to bootstrap@ Configuring your
peer to provide a hostlist

@node GNUnet's IDENTITY subsystem
@section GNUnet's IDENTITY subsystem

@c %**end of header

Identities of "users" in GNUnet are called egos. Egos can be used as pseudonyms
(fake names) or be tied to an organization (for example, GNU) or even the
actual identity of a human. GNUnet users are expected to have many egos. They
might have one tied to their real identity, some for organizations they manage,
and more for different domains where they want to operate under a pseudonym.

The IDENTITY service allows users to manage their egos. The identity service
manages the private keys egos of the local user; it does not manage identities
of other users (public keys). Public keys for other users need names to become
manageable. GNUnet uses the GNU Name System (GNS) to give names to other users
and manage their public keys securely. This chapter is about the IDENTITY
service, which is about the management of private keys.

On the network, an ego corresponds to an ECDSA key (over Curve25519, using RFC
6979, as required by GNS). Thus, users can perform actions under a particular
ego by using (signing with) a particular private key. Other users can then
confirm that the action was really performed by that ego by checking the
signature against the respective public key.

The IDENTITY service allows users to associate a human-readable name with each
ego. This way, users can use names that will remind them of the purpose of a
particular ego. The IDENTITY service will store the respective private keys and
allows applications to access key information by name. Users can change the
name that is locally (!) associated with an ego. Egos can also be deleted,
which means that the private key will be removed and it thus will not be
possible to perform actions with that ego in the future.

Additionally, the IDENTITY subsystem can associate service functions with egos.
For example, GNS requires the ego that should be used for the shorten zone. GNS
will ask IDENTITY for an ego for the "gns-short" service. The IDENTITY service
has a mapping of such service strings to the name of the ego that the user
wants to use for this service, for example "my-short-zone-ego".

Finally, the IDENTITY API provides access to a special ego, the anonymous ego.
The anonymous ego is special in that its private key is not really private, but
fixed and known to everyone. Thus, anyone can perform actions as anonymous.
This can be useful as with this trick, code does not have to contain a special
case to distinguish between anonymous and pseudonymous egos.

@menu
* libgnunetidentity::
* The IDENTITY Client-Service Protocol::
@end menu

@node libgnunetidentity
@subsection libgnunetidentity
@c %**end of header


@menu
* Connecting to the service::
* Operations on Egos::
* The anonymous Ego::
* Convenience API to lookup a single ego::
* Associating egos with service functions::
@end menu

@node Connecting to the service
@subsubsection Connecting to the service

@c %**end of header

First, typical clients connect to the identity service using
@code{GNUNET_IDENTITY_connect}. This function takes a callback as a parameter.
If the given callback parameter is non-null, it will be invoked to notify the
application about the current state of the identities in the system.

@itemize @bullet
@item First, it will be invoked on all known egos at the time of the
connection. For each ego, a handle to the ego and the user's name for the ego
will be passed to the callback. Furthermore, a @code{void **} context argument
will be provided which gives the client the opportunity to associate some state
with the ego.
@item Second, the callback will be invoked with NULL for the ego, the name and
the context. This signals that the (initial) iteration over all egos has
completed.
@item Then, the callback will be invoked whenever something changes about an
ego. If an ego is renamed, the callback is invoked with the ego handle of the
ego that was renamed, and the new name. If an ego is deleted, the callback is
invoked with the ego handle and a name of NULL. In the deletion case, the
application should also release resources stored in the context.
@item When the application destroys the connection to the identity service
using @code{GNUNET_IDENTITY_disconnect}, the callback is again invoked with the
ego and a name of NULL (equivalent to deletion of the egos). This should again
be used to clean up the per-ego context.
@end itemize

The ego handle passed to the callback remains valid until the callback is
invoked with a name of NULL, so it is safe to store a reference to the ego's
handle.

@node Operations on Egos
@subsubsection Operations on Egos

@c %**end of header

Given an ego handle, the main operations are to get its associated private key
using @code{GNUNET_IDENTITY_ego_get_private_key} or its associated public key
using @code{GNUNET_IDENTITY_ego_get_public_key}.

The other operations on egos are pretty straightforward. Using
@code{GNUNET_IDENTITY_create}, an application can request the creation of an
ego by specifying the desired name. The operation will fail if that name is
already in use. Using @code{GNUNET_IDENTITY_rename} the name of an existing ego
can be changed. Finally, egos can be deleted using
@code{GNUNET_IDENTITY_delete}. All of these operations will trigger updates to
the callback given to the @code{GNUNET_IDENTITY_connect} function of all
applications that are connected with the identity service at the time.
@code{GNUNET_IDENTITY_cancel} can be used to cancel the operations before the
respective continuations would be called. It is not guaranteed that the
operation will not be completed anyway, only the continuation will no longer be
called.

@node The anonymous Ego
@subsubsection The anonymous Ego

@c %**end of header

A special way to obtain an ego handle is to call
@code{GNUNET_IDENTITY_ego_get_anonymous}, which returns an ego for the
"anonymous" user --- anyone knows and can get the private key for this user, so
it is suitable for operations that are supposed to be anonymous but require
signatures (for example, to avoid a special path in the code). The anonymous
ego is always valid and accessing it does not require a connection to the
identity service.

@node Convenience API to lookup a single ego
@subsubsection Convenience API to lookup a single ego


As applications commonly simply have to lookup a single ego, there is a
convenience API to do just that. Use @code{GNUNET_IDENTITY_ego_lookup} to
lookup a single ego by name. Note that this is the user's name for the ego, not
the service function. The resulting ego will be returned via a callback and
will only be valid during that callback. The operation can be cancelled via
@code{GNUNET_IDENTITY_ego_lookup_cancel} (cancellation is only legal before the
callback is invoked).

@node Associating egos with service functions
@subsubsection Associating egos with service functions


The @code{GNUNET_IDENTITY_set} function is used to associate a particular ego
with a service function. The name used by the service and the ego are given as
arguments. Afterwards, the service can use its name to lookup the associated
ego using @code{GNUNET_IDENTITY_get}.

@node The IDENTITY Client-Service Protocol
@subsection The IDENTITY Client-Service Protocol

@c %**end of header

A client connecting to the identity service first sends a message with type
@code{GNUNET_MESSAGE_TYPE_IDENTITY_START} to the service. After that, the
client will receive information about changes to the egos by receiving messages
of type @code{GNUNET_MESSAGE_TYPE_IDENTITY_UPDATE}. Those messages contain the
private key of the ego and the user's name of the ego (or zero bytes for the
name to indicate that the ego was deleted). A special bit @code{end_of_list} is
used to indicate the end of the initial iteration over the identity service's
egos.

The client can trigger changes to the egos by sending CREATE, RENAME or DELETE
messages. The CREATE message contains the private key and the desired name. The
RENAME message contains the old name and the new name. The DELETE message only
needs to include the name of the ego to delete. The service responds to each of
these messages with a RESULT_CODE message which indicates success or error of
the operation, and possibly a human-readable error message.

Finally, the client can bind the name of a service function to an ego by
sending a SET_DEFAULT message with the name of the service function and the
private key of the ego. Such bindings can then be resolved using a GET_DEFAULT
message, which includes the name of the service function. The identity service
will respond to a GET_DEFAULT request with a SET_DEFAULT message containing the
respective information, or with a RESULT_CODE to indicate an error.

@node GNUnet's NAMESTORE Subsystem
@section GNUnet's NAMESTORE Subsystem

@c %**end of header

The NAMESTORE subsystem provides persistent storage for local GNS zone
information. All local GNS zone information are managed by NAMESTORE. It
provides both the functionality to administer local GNS information (e.g.
delete and add records) as well as to retrieve GNS information (e.g to list
name information in a client). NAMESTORE does only manage the persistent
storage of zone information belonging to the user running the service: GNS
information from other users obtained from the DHT are stored by the NAMECACHE
subsystem.

NAMESTORE uses a plugin-based database backend to store GNS information with
good performance. Here sqlite, MySQL and PostgreSQL are supported database
backends. NAMESTORE clients interact with the IDENTITY subsystem to obtain
cryptographic information about zones based on egos as described with the
IDENTITY subsystem., but internally NAMESTORE refers to zones using the ECDSA
private key. In addition, it collaborates with the NAMECACHE subsystem and
stores zone information when local information are modified in the GNS cache to
increase look-up performance for local information.

NAMESTORE provides functionality to look-up and store records, to iterate over
a specific or all zones and to monitor zones for changes. NAMESTORE
functionality can be accessed using the NAMESTORE api or the NAMESTORE command
line tool.

@menu
* libgnunetnamestore::
@end menu

@node libgnunetnamestore
@subsection libgnunetnamestore

@c %**end of header

To interact with NAMESTORE clients first connect to the NAMESTORE service using
the @code{GNUNET_NAMESTORE_connect} passing a configuration handle. As a result
they obtain a NAMESTORE handle, they can use for operations, or NULL is
returned if the connection failed.

To disconnect from NAMESTORE, clients use @code{GNUNET_NAMESTORE_disconnect}
and specify the handle to disconnect.

NAMESTORE internally uses the ECDSA private key to refer to zones. These
private keys can be obtained from the IDENTITY subsytem. Here @emph{egos@emph{
can be used to refer to zones or the default ego assigned to the GNS subsystem
can be used to obtained the master zone's private key.}}


@menu
* Editing Zone Information::
* Iterating Zone Information::
* Monitoring Zone Information::
@end menu

@node Editing Zone Information
@subsubsection Editing Zone Information

@c %**end of header

NAMESTORE provides functions to lookup records stored under a label in a zone
and to store records under a label in a zone.

To store (and delete) records, the client uses the
@code{GNUNET_NAMESTORE_records_store} function and has to provide namestore
handle to use, the private key of the zone, the label to store the records
under, the records and number of records plus an callback function. After the
operation is performed NAMESTORE will call the provided callback function with
the result GNUNET_SYSERR on failure (including timeout/queue drop/failure to
validate), GNUNET_NO if content was already there or not found GNUNET_YES (or
other positive value) on success plus an additional error message.

Records are deleted by using the store command with 0 records to store. It is
important to note, that records are not merged when records exist with the
label. So a client has first to retrieve records, merge with existing records
and then store the result.

To perform a lookup operation, the client uses the
@code{GNUNET_NAMESTORE_records_store} function. Here he has to pass the
namestore handle, the private key of the zone and the label. He also has to
provide a callback function which will be called with the result of the lookup
operation: the zone for the records, the label, and the records including the
number of records included.

A special operation is used to set the preferred nickname for a zone. This
nickname is stored with the zone and is automatically merged with all labels
and records stored in a zone. Here the client uses the
@code{GNUNET_NAMESTORE_set_nick} function and passes the private key of the
zone, the nickname as string plus a the callback with the result of the
operation.

@node Iterating Zone Information
@subsubsection Iterating Zone Information

@c %**end of header

A client can iterate over all information in a zone or all zones managed by
NAMESTORE. Here a client uses the @code{GNUNET_NAMESTORE_zone_iteration_start}
function and passes the namestore handle, the zone to iterate over and a
callback function to call with the result. If the client wants to iterate over
all the, he passes NULL for the zone. A @code{GNUNET_NAMESTORE_ZoneIterator}
handle is returned to be used to continue iteration.

NAMESTORE calls the callback for every result and expects the client to call@
@code{GNUNET_NAMESTORE_zone_iterator_next} to continue to iterate or
@code{GNUNET_NAMESTORE_zone_iterator_stop} to interrupt the iteration. When
NAMESTORE reached the last item it will call the callback with a NULL value to
indicate.

@node Monitoring Zone Information
@subsubsection Monitoring Zone Information

@c %**end of header

Clients can also monitor zones to be notified about changes. Here the clients
uses the @code{GNUNET_NAMESTORE_zone_monitor_start} function and passes the
private key of the zone and and a callback function to call with updates for a
zone. The client can specify to obtain zone information first by iterating over
the zone and specify a synchronization callback to be called when the client
and the namestore are synced.

On an update, NAMESTORE will call the callback with the private key of the
zone, the label and the records and their number.

To stop monitoring, the client call @code{GNUNET_NAMESTORE_zone_monitor_stop}
and passes the handle obtained from the function to start the monitoring.

@node GNUnet's PEERINFO subsystem
@section GNUnet's PEERINFO subsystem

@c %**end of header

The PEERINFO subsystem is used to store verified (validated) information about
known peers in a persistent way. It obtains these addresses for example from
TRANSPORT service which is in charge of address validation. Validation means
that the information in the HELLO message are checked by connecting to the
addresses and performing a cryptographic handshake to authenticate the peer
instance stating to be reachable with these addresses. Peerinfo does not
validate the HELLO messages itself but only stores them and gives them to
interested clients.

As future work, we think about moving from storing just HELLO messages to
providing a generic persistent per-peer information store. More and more
subsystems tend to need to store per-peer information in persistent way. To not
duplicate this functionality we plan to provide a PEERSTORE service providing
this functionality

@menu
* Features2::
* Limitations3::
* DeveloperPeer Information::
* Startup::
* Managing Information::
* Obtaining Information::
* The PEERINFO Client-Service Protocol::
* libgnunetpeerinfo::
@end menu

@node Features2
@subsection Features2

@c %**end of header

@itemize @bullet
@item Persistent storage
@item Client notification mechanism on update
@item Periodic clean up for expired information
@item Differentiation between public and friend-only HELLO
@end itemize

@node Limitations3
@subsection Limitations3


@itemize @bullet
@item Does not perform HELLO validation
@end itemize

@node DeveloperPeer Information
@subsection DeveloperPeer Information

@c %**end of header

The PEERINFO subsystem stores these information in the form of HELLO messages
you can think of as business cards. These HELLO messages contain the public key
of a peer and the addresses a peer can be reached under. The addresses include
an expiration date describing how long they are valid. This information is
updated regularly by the TRANSPORT service by revalidating the address. If an
address is expired and not renewed, it can be removed from the HELLO message.

Some peer do not want to have their HELLO messages distributed to other peers ,
especially when GNUnet's friend-to-friend modus is enabled. To prevent this
undesired distribution. PEERINFO distinguishes between @emph{public} and
@emph{friend-only} HELLO messages. Public HELLO messages can be freely
distributed to other (possibly unknown) peers (for example using the hostlist,
gossiping, broadcasting), whereas friend-only HELLO messages may not be
distributed to other peers. Friend-only HELLO messages have an additional flag
@code{friend_only} set internally. For public HELLO message this flag is not
set. PEERINFO does and cannot not check if a client is allowed to obtain a
specific HELLO type.

The HELLO messages can be managed using the GNUnet HELLO library. Other GNUnet
systems can obtain these information from PEERINFO and use it for their
purposes. Clients are for example the HOSTLIST component providing these
information to other peers in form of a hostlist or the TRANSPORT subsystem
using these information to maintain connections to other peers.

@node Startup
@subsection Startup

@c %**end of header

During startup the PEERINFO services loads persistent HELLOs from disk. First
PEERINFO parses the directory configured in the HOSTS value of the
@code{PEERINFO} configuration section to store PEERINFO information.@ For all
files found in this directory valid HELLO messages are extracted. In addition
it loads HELLO messages shipped with the GNUnet distribution. These HELLOs are
used to simplify network bootstrapping by providing valid peer information with
the distribution. The use of these HELLOs can be prevented by setting the
@code{USE_INCLUDED_HELLOS} in the @code{PEERINFO} configuration section to
@code{NO}. Files containing invalid information are removed.

@node Managing Information
@subsection Managing Information

@c %**end of header

The PEERINFO services stores information about known PEERS and a single HELLO
message for every peer. A peer does not need to have a HELLO if no information
are available. HELLO information from different sources, for example a HELLO
obtained from a remote HOSTLIST and a second HELLO stored on disk, are combined
and merged into one single HELLO message per peer which will be given to
clients. During this merge process the HELLO is immediately written to disk to
ensure persistence.

PEERINFO in addition periodically scans the directory where information are
stored for empty HELLO messages with expired TRANSPORT addresses.@ This
periodic task scans all files in the directory and recreates the HELLO messages
it finds. Expired TRANSPORT addresses are removed from the HELLO and if the
HELLO does not contain any valid addresses, it is discarded and removed from
disk.

@node Obtaining Information
@subsection Obtaining Information

@c %**end of header

When a client requests information from PEERINFO, PEERINFO performs a lookup
for the respective peer or all peers if desired and transmits this information
to the client. The client can specify if friend-only HELLOs have to be included
or not and PEERINFO filters the respective HELLO messages before transmitting
information.

To notify clients about changes to PEERINFO information, PEERINFO maintains a
list of clients interested in this notifications. Such a notification occurs if
a HELLO for a peer was updated (due to a merge for example) or a new peer was
added.

@node The PEERINFO Client-Service Protocol
@subsection The PEERINFO Client-Service Protocol

@c %**end of header

To connect and disconnect to and from the PEERINFO Service PEERINFO utilizes
the util client/server infrastructure, so no special messages types are used
here.

To add information for a peer, the plain HELLO message is transmitted to the
service without any wrapping. Alle information required are stored within the
HELLO message. The PEERINFO service provides a message handler accepting and
processing these HELLO messages.

When obtaining PEERINFO information using the iterate functionality specific
messages are used. To obtain information for all peers, a @code{struct
ListAllPeersMessage} with message type
@code{GNUNET_MESSAGE_TYPE_PEERINFO_GET_ALL} and a flag include_friend_only to
indicate if friend-only HELLO messages should be included are transmitted. If
information for a specific peer is required a @code{struct ListAllPeersMessage}
with @code{GNUNET_MESSAGE_TYPE_PEERINFO_GET} containing the peer identity is
used.

For both variants the PEERINFO service replies for each HELLO message he wants
to transmit with a @code{struct ListAllPeersMessage} with type
@code{GNUNET_MESSAGE_TYPE_PEERINFO_INFO} containing the plain HELLO. The final
message is @code{struct GNUNET_MessageHeader} with type
@code{GNUNET_MESSAGE_TYPE_PEERINFO_INFO}. If the client receives this message,
he can proceed with the next request if any is pending

@node libgnunetpeerinfo
@subsection libgnunetpeerinfo

@c %**end of header

The PEERINFO API consists mainly of three different functionalities:
maintaining a connection to the service, adding new information and retrieving
information form the PEERINFO service.


@menu
* Connecting to the Service::
* Adding Information::
* Obtaining Information2::
@end menu

@node Connecting to the Service
@subsubsection Connecting to the Service

@c %**end of header

To connect to the PEERINFO service the function @code{GNUNET_PEERINFO_connect}
is used, taking a configuration handle as an argument, and to disconnect from
PEERINFO the function @code{GNUNET_PEERINFO_disconnect}, taking the PEERINFO
handle returned from the connect function has to be called.

@node Adding Information
@subsubsection Adding Information

@c %**end of header

@code{GNUNET_PEERINFO_add_peer} adds a new peer to the PEERINFO subsystem
storage. This function takes the PEERINFO handle as an argument, the HELLO
message to store and a continuation with a closure to be called with the result
of the operation. The @code{GNUNET_PEERINFO_add_peer} returns a handle to this
operation allowing to cancel the operation with the respective cancel function
@code{GNUNET_PEERINFO_add_peer_cancel}. To retrieve information from PEERINFO
you can iterate over all information stored with PEERINFO or you can tell
PEERINFO to notify if new peer information are available.

@node Obtaining Information2
@subsubsection Obtaining Information2

@c %**end of header

To iterate over information in PEERINFO you use @code{GNUNET_PEERINFO_iterate}.
This function expects the PEERINFO handle, a flag if HELLO messages intended
for friend only mode should be included, a timeout how long the operation
should take and a callback with a callback closure to be called for the
results. If you want to obtain information for a specific peer, you can specify
the peer identity, if this identity is NULL, information for all peers are
returned. The function returns a handle to allow to cancel the operation using
@code{GNUNET_PEERINFO_iterate_cancel}.

To get notified when peer information changes, you can use
@code{GNUNET_PEERINFO_notify}. This function expects a configuration handle and
a flag if friend-only HELLO messages should be included. The PEERINFO service
will notify you about every change and the callback function will be called to
notify you about changes. The function returns a handle to cancel notifications
with @code{GNUNET_PEERINFO_notify_cancel}.


@node GNUnet's PEERSTORE subsystem
@section GNUnet's PEERSTORE subsystem

@c %**end of header

GNUnet's PEERSTORE subsystem offers persistent per-peer storage for other
GNUnet subsystems. GNUnet subsystems can use PEERSTORE to persistently store
and retrieve arbitrary data. Each data record stored with PEERSTORE contains
the following fields:

@itemize @bullet
@item subsystem: Name of the subsystem responsible for the record.
@item peerid: Identity of the peer this record is related to.
@item key: a key string identifying the record.
@item value: binary record value.
@item expiry: record expiry date.
@end itemize

@menu
* Functionality::
* Architecture::
* libgnunetpeerstore::
@end menu

@node Functionality
@subsection Functionality

@c %**end of header

Subsystems can store any type of value under a (subsystem, peerid, key)
combination. A "replace" flag set during store operations forces the PEERSTORE
to replace any old values stored under the same (subsystem, peerid, key)
combination with the new value. Additionally, an expiry date is set after which
the record is *possibly* deleted by PEERSTORE.

Subsystems can iterate over all values stored under any of the following
combination of fields:

@itemize @bullet
@item (subsystem)
@item (subsystem, peerid)
@item (subsystem, key)
@item (subsystem, peerid, key)
@end itemize

Subsystems can also request to be notified about any new values stored under a
(subsystem, peerid, key) combination by sending a "watch" request to
PEERSTORE.

@node Architecture
@subsection Architecture

@c %**end of header

PEERSTORE implements the following components:

@itemize @bullet
@item PEERSTORE service: Handles store, iterate and watch operations.
@item PEERSTORE API: API to be used by other subsystems to communicate and
issue commands to the PEERSTORE service.
@item PEERSTORE plugins: Handles the persistent storage. At the moment, only an
"sqlite" plugin is implemented.
@end itemize

@node libgnunetpeerstore
@subsection libgnunetpeerstore

@c %**end of header

libgnunetpeerstore is the library containing the PEERSTORE API. Subsystems
wishing to communicate with the PEERSTORE service use this API to open a
connection to PEERSTORE. This is done by calling
@code{GNUNET_PEERSTORE_connect} which returns a handle to the newly created
connection. This handle has to be used with any further calls to the API.

To store a new record, the function @code{GNUNET_PEERSTORE_store} is to be used
which requires the record fields and a continuation function that will be
called by the API after the STORE request is sent to the PEERSTORE service.
Note that calling the continuation function does not mean that the record is
successfully stored, only that the STORE request has been successfully sent to
the PEERSTORE service. @code{GNUNET_PEERSTORE_store_cancel} can be called to
cancel the STORE request only before the continuation function has been called.

To iterate over stored records, the function @code{GNUNET_PEERSTORE_iterate} is
to be used. @emph{peerid} and @emph{key} can be set to NULL. An iterator
callback function will be called with each matching record found and a NULL
record at the end to signal the end of result set.
@code{GNUNET_PEERSTORE_iterate_cancel} can be used to cancel the ITERATE
request before the iterator callback is called with a NULL record.

To be notified with new values stored under a (subsystem, peerid, key)
combination, the function @code{GNUNET_PEERSTORE_watch} is to be used. This
will register the watcher with the PEERSTORE service, any new records matching
the given combination will trigger the callback function passed to
@code{GNUNET_PEERSTORE_watch}. This continues until
@code{GNUNET_PEERSTORE_watch_cancel} is called or the connection to the service
is destroyed.

After the connection is no longer needed, the function
@code{GNUNET_PEERSTORE_disconnect} can be called to disconnect from the
PEERSTORE service. Any pending ITERATE or WATCH requests will be destroyed. If
the @code{sync_first} flag is set to @code{GNUNET_YES}, the API will delay the
disconnection until all pending STORE requests are sent to the PEERSTORE
service, otherwise, the pending STORE requests will be destroyed as well.

@node GNUnet's SET Subsystem
@section GNUnet's SET Subsystem

@c %**end of header

The SET service implements efficient set operations between two peers over a
mesh tunnel. Currently, set union and set intersection are the only supported
operations. Elements of a set consist of an @emph{element type} and arbitrary
binary @emph{data}. The size of an element's data is limited to around 62
KB.

@menu
* Local Sets::
* Set Modifications::
* Set Operations::
* Result Elements::
* libgnunetset::
* The SET Client-Service Protocol::
* The SET Intersection Peer-to-Peer Protocol::
* The SET Union Peer-to-Peer Protocol::
@end menu

@node Local Sets
@subsection Local Sets

@c %**end of header

Sets created by a local client can be modified and reused for multiple
operations. As each set operation requires potentially expensive special
auxilliary data to be computed for each element of a set, a set can only
participate in one type of set operation (i.e. union or intersection). The type
of a set is determined upon its creation. If a the elements of a set are needed
for an operation of a different type, all of the set's element must be copied
to a new set of appropriate type.

@node Set Modifications
@subsection Set Modifications

@c %**end of header

Even when set operations are active, one can add to and remove elements from a
set. However, these changes will only be visible to operations that have been
created after the changes have taken place. That is, every set operation only
sees a snapshot of the set from the time the operation was started. This
mechanism is @emph{not} implemented by copying the whole set, but by attaching
@emph{generation information} to each element and operation.

@node Set Operations
@subsection Set Operations

@c %**end of header

Set operations can be started in two ways: Either by accepting an operation
request from a remote peer, or by requesting a set operation from a remote
peer. Set operations are uniquely identified by the involved @emph{peers}, an
@emph{application id} and the @emph{operation type}.

The client is notified of incoming set operations by @emph{set listeners}. A
set listener listens for incoming operations of a specific operation type and
application id. Once notified of an incoming set request, the client can
accept the set request (providing a local set for the operation) or reject
it.

@node Result Elements
@subsection Result Elements

@c %**end of header

The SET service has three @emph{result modes} that determine how an operation's
result set is delivered to the client:

@itemize @bullet
@item @strong{Full Result Set.} All elements of set resulting from the set
operation are returned to the client.
@item @strong{Added Elements.} Only elements that result from the operation and
are not already in the local peer's set are returned. Note that for some
operations (like set intersection) this result mode will never return any
elements. This can be useful if only the remove peer is actually interested in
the result of the set operation.
@item @strong{Removed Elements.} Only elements that are in the local peer's
initial set but not in the operation's result set are returned. Note that for
some operations (like set union) this result mode will never return any
elements. This can be useful if only the remove peer is actually interested in
the result of the set operation.
@end itemize

@node libgnunetset
@subsection libgnunetset

@c %**end of header

@menu
* Sets::
* Listeners::
* Operations::
* Supplying a Set::
* The Result Callback::
@end menu

@node Sets
@subsubsection Sets

@c %**end of header

New sets are created with @code{GNUNET_SET_create}. Both the local peer's
configuration (as each set has its own client connection) and the operation
type must be specified. The set exists until either the client calls
@code{GNUNET_SET_destroy} or the client's connection to the service is
disrupted. In the latter case, the client is notified by the return value of
functions dealing with sets. This return value must always be checked.

Elements are added and removed with @code{GNUNET_SET_add_element} and
@code{GNUNET_SET_remove_element}.

@node Listeners
@subsubsection Listeners

@c %**end of header

Listeners are created with @code{GNUNET_SET_listen}. Each time time a remote
peer suggests a set operation with an application id and operation type
matching a listener, the listener's callack is invoked. The client then must
synchronously call either @code{GNUNET_SET_accept} or @code{GNUNET_SET_reject}.
Note that the operation will not be started until the client calls
@code{GNUNET_SET_commit} (see Section "Supplying a Set").

@node Operations
@subsubsection Operations

@c %**end of header

Operations to be initiated by the local peer are created with
@code{GNUNET_SET_prepare}. Note that the operation will not be started until
the client calls @code{GNUNET_SET_commit} (see Section "Supplying a
Set").

@node Supplying a Set
@subsubsection Supplying a Set

@c %**end of header

To create symmetry between the two ways of starting a set operation (accepting
and nitiating it), the operation handles returned by @code{GNUNET_SET_accept}
and @code{GNUNET_SET_prepare} do not yet have a set to operate on, thus they
can not do any work yet.

The client must call @code{GNUNET_SET_commit} to specify a set to use for an
operation. @code{GNUNET_SET_commit} may only be called once per set
operation.

@node The Result Callback
@subsubsection The Result Callback

@c %**end of header

Clients must specify both a result mode and a result callback with
@code{GNUNET_SET_accept} and @code{GNUNET_SET_prepare}. The result callback
with a status indicating either that an element was received, or the operation
failed or succeeded. The interpretation of the received element depends on the
result mode. The callback needs to know which result mode it is used in, as the
arguments do not indicate if an element is part of the full result set, or if
it is in the difference between the original set and the final set.

@node The SET Client-Service Protocol
@subsection The SET Client-Service Protocol

@c %**end of header

@menu
* Creating Sets::
* Listeners2::
* Initiating Operations::
* Modifying Sets::
* Results and Operation Status::
* Iterating Sets::
@end menu

@node Creating Sets
@subsubsection Creating Sets

@c %**end of header

For each set of a client, there exists a client connection to the service. Sets
are created by sending the @code{GNUNET_SERVICE_SET_CREATE} message over a new
client connection. Multiple operations for one set are multiplexed over one
client connection, using a request id supplied by the client.

@node Listeners2
@subsubsection Listeners2

@c %**end of header

Each listener also requires a seperate client connection. By sending the
@code{GNUNET_SERVICE_SET_LISTEN} message, the client notifies the service of
the application id and operation type it is interested in. A client rejects an
incoming request by sending @code{GNUNET_SERVICE_SET_REJECT} on the listener's
client connection. In contrast, when accepting an incoming request, a a
@code{GNUNET_SERVICE_SET_ACCEPT} message must be sent over the@ set that is
supplied for the set operation.

@node Initiating Operations
@subsubsection Initiating Operations

@c %**end of header

Operations with remote peers are initiated by sending a
@code{GNUNET_SERVICE_SET_EVALUATE} message to the service. The@ client
connection that this message is sent by determines the set to use.

@node Modifying Sets
@subsubsection Modifying Sets

@c %**end of header

Sets are modified with the @code{GNUNET_SERVICE_SET_ADD} and
@code{GNUNET_SERVICE_SET_REMOVE} messages.


@c %@menu
@c %* Results and Operation Status::
@c %* Iterating Sets::
@c %@end menu   

@node Results and Operation Status
@subsubsection Results and Operation Status
@c %**end of header

The service notifies the client of result elements and success/failure of a set
operation with the @code{GNUNET_SERVICE_SET_RESULT} message.

@node Iterating Sets
@subsubsection Iterating Sets

@c %**end of header

All elements of a set can be requested by sending
@code{GNUNET_SERVICE_SET_ITER_REQUEST}. The server responds with
@code{GNUNET_SERVICE_SET_ITER_ELEMENT} and eventually terminates the iteration
with @code{GNUNET_SERVICE_SET_ITER_DONE}. After each received element, the
client@ must send @code{GNUNET_SERVICE_SET_ITER_ACK}. Note that only one set
iteration may be active for a set at any given time.

@node The SET Intersection Peer-to-Peer Protocol
@subsection The SET Intersection Peer-to-Peer Protocol

@c %**end of header

The intersection protocol operates over CADET and starts with a
GNUNET_MESSAGE_TYPE_SET_P2P_OPERATION_REQUEST being sent by the peer initiating
the operation to the peer listening for inbound requests. It includes the
number of elements of the initiating peer, which is used to decide which side
will send a Bloom filter first.

The listening peer checks if the operation type and application identifier are
acceptable for its current state. If not, it responds with a
GNUNET_MESSAGE_TYPE_SET_RESULT and a status of GNUNET_SET_STATUS_FAILURE (and
terminates the CADET channel).

If the application accepts the request, the listener sends back a@
GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_ELEMENT_INFO if it has more elements
in the set than the client. Otherwise, it immediately starts with the Bloom
filter exchange. If the initiator receives a
GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_ELEMENT_INFO response, it beings the
Bloom filter exchange, unless the set size is indicated to be zero, in which
case the intersection is considered finished after just the initial
handshake.


@menu
* The Bloom filter exchange::
* Salt::
@end menu

@node The Bloom filter exchange
@subsubsection The Bloom filter exchange

@c %**end of header

In this phase, each peer transmits a Bloom filter over the remaining keys of
the local set to the other peer using a
GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_BF message. This message additionally
includes the number of elements left in the sender's set, as well as the XOR
over all of the keys in that set.

The number of bits 'k' set per element in the Bloom filter is calculated based
on the relative size of the two sets. Furthermore, the size of the Bloom filter
is calculated based on 'k' and the number of elements in the set to maximize
the amount of data filtered per byte transmitted on the wire (while avoiding an
excessively high number of iterations).

The receiver of the message removes all elements from its local set that do not
pass the Bloom filter test. It then checks if the set size of the sender and
the XOR over the keys match what is left of his own set. If they do, he sends
a@ GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_DONE back to indicate that the
latest set is the final result. Otherwise, the receiver starts another Bloom
fitler exchange, except this time as the sender.

@node Salt
@subsubsection Salt

@c %**end of header

Bloomfilter operations are probablistic: With some non-zero probability the
test may incorrectly say an element is in the set, even though it is not.

To mitigate this problem, the intersection protocol iterates exchanging Bloom
filters using a different random 32-bit salt in each iteration (the salt is
also included in the message). With different salts, set operations may fail
for different elements. Merging the results from the executions, the
probability of failure drops to zero.

The iterations terminate once both peers have established that they have sets
of the same size, and where the XOR over all keys computes the same 512-bit
value (leaving a failure probability of 2-511).

@node The SET Union Peer-to-Peer Protocol
@subsection The SET Union Peer-to-Peer Protocol

@c %**end of header

The SET union protocol is based on Eppstein's efficient set reconciliation
without prior context. You should read this paper first if you want to
understand the protocol.

The union protocol operates over CADET and starts with a
GNUNET_MESSAGE_TYPE_SET_P2P_OPERATION_REQUEST being sent by the peer initiating
the operation to the peer listening for inbound requests. It includes the
number of elements of the initiating peer, which is currently not used.

The listening peer checks if the operation type and application identifier are
acceptable for its current state. If not, it responds with a
GNUNET_MESSAGE_TYPE_SET_RESULT and a status of GNUNET_SET_STATUS_FAILURE (and
terminates the CADET channel).

If the application accepts the request, it sends back a strata estimator using
a message of type GNUNET_MESSAGE_TYPE_SET_UNION_P2P_SE. The initiator evaluates
the strata estimator and initiates the exchange of invertible Bloom filters,
sending a GNUNET_MESSAGE_TYPE_SET_UNION_P2P_IBF.

During the IBF exchange, if the receiver cannot invert the Bloom filter or
detects a cycle, it sends a larger IBF in response (up to a defined maximum
limit; if that limit is reached, the operation fails). Elements decoded while
processing the IBF are transmitted to the other peer using
GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENTS, or requested from the other peer using
GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENT_REQUESTS messages, depending on the sign
observed during decoding of the IBF. Peers respond to a
GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENT_REQUESTS message with the respective
element in a GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENTS message. If the IBF fully
decodes, the peer responds with a GNUNET_MESSAGE_TYPE_SET_UNION_P2P_DONE
message instead of another GNUNET_MESSAGE_TYPE_SET_UNION_P2P_IBF.

All Bloom filter operations use a salt to mingle keys before hasing them into
buckets, such that future iterations have a fresh chance of succeeding if they
failed due to collisions before.

@node GNUnet's STATISTICS subsystem
@section GNUnet's STATISTICS subsystem

@c %**end of header

In GNUnet, the STATISTICS subsystem offers a central place for all subsystems
to publish unsigned 64-bit integer run-time statistics. Keeping this
information centrally means that there is a unified way for the user to obtain
data on all subsystems, and individual subsystems do not have to always include
a custom data export method for performance metrics and other statistics. For
example, the TRANSPORT system uses STATISTICS to update information about the
number of directly connected peers and the bandwidth that has been consumed by
the various plugins. This information is valuable for diagnosing connectivity
and performance issues.

Following the GNUnet service architecture, the STATISTICS subsystem is divided
into an API which is exposed through the header
@strong{gnunet_statistics_service.h} and the STATISTICS service
@strong{gnunet-service-statistics}. The @strong{gnunet-statistics} command-line
tool can be used to obtain (and change) information about the values stored by
the STATISTICS service. The STATISTICS service does not communicate with other
peers.

Data is stored in the STATISTICS service in the form of tuples
@strong{(subsystem, name, value, persistence)}. The subsystem determines to
which other GNUnet's subsystem the data belongs. name is the name through which
value is associated. It uniquely identifies the record from among other records
belonging to the same subsystem. In some parts of the code, the pair
@strong{(subsystem, name)} is called a @strong{statistic} as it identifies the
values stored in the STATISTCS service.The persistence flag determines if the
record has to be preserved across service restarts. A record is said to be
persistent if this flag is set for it; if not, the record is treated as a
non-persistent record and it is lost after service restart. Persistent records
are written to and read from the file @strong{statistics.data} before shutdown
and upon startup. The file is located in the HOME directory of the peer.

An anomaly of the STATISTICS service is that it does not terminate immediately
upon receiving a shutdown signal if it has any clients connected to it. It
waits for all the clients that are not monitors to close their connections
before terminating itself. This is to prevent the loss of data during peer
shutdown --- delaying the STATISTICS service shutdown helps other services to
store important data to STATISTICS during shutdown.

@menu
* libgnunetstatistics::
* The STATISTICS Client-Service Protocol::
@end menu

@node libgnunetstatistics
@subsection libgnunetstatistics

@c %**end of header

@strong{libgnunetstatistics} is the library containing the API for the
STATISTICS subsystem. Any process requiring to use STATISTICS should use this
API by to open a connection to the STATISTICS service. This is done by calling
the function @code{GNUNET_STATISTICS_create()}. This function takes the
subsystem's name which is trying to use STATISTICS and a configuration. All
values written to STATISTICS with this connection will be placed in the section
corresponding to the given subsystem's name. The connection to STATISTICS can
be destroyed with the function GNUNET_STATISTICS_destroy(). This function
allows for the connection to be destroyed immediately or upon transferring all
pending write requests to the service.

Note: STATISTICS subsystem can be disabled by setting @code{DISABLE = YES}
under the @code{[STATISTICS]} section in the configuration. With such a
configuration all calls to @code{GNUNET_STATISTICS_create()} return @code{NULL}
as the STATISTICS subsystem is unavailable and no other functions from the API
can be used.


@menu
* Statistics retrieval::
* Setting statistics and updating them::
* Watches::
@end menu

@node Statistics retrieval
@subsubsection Statistics retrieval

@c %**end of header

Once a connection to the statistics service is obtained, information about any
other system which uses statistics can be retrieved with the function
GNUNET_STATISTICS_get(). This function takes the connection handle, the name of
the subsystem whose information we are interested in (a @code{NULL} value will
retrieve information of all available subsystems using STATISTICS), the name of
the statistic we are interested in (a @code{NULL} value will retrieve all
available statistics), a continuation callback which is called when all of
requested information is retrieved, an iterator callback which is called for
each parameter in the retrieved information and a closure for the
aforementioned callbacks. The library then invokes the iterator callback for
each value matching the request.

Call to @code{GNUNET_STATISTICS_get()} is asynchronous and can be canceled with
the function @code{GNUNET_STATISTICS_get_cancel()}. This is helpful when
retrieving statistics takes too long and especially when we want to shutdown
and cleanup everything.

@node Setting statistics and updating them
@subsubsection Setting statistics and updating them

@c %**end of header

So far we have seen how to retrieve statistics, here we will learn how we can
set statistics and update them so that other subsystems can retrieve them.

A new statistic can be set using the function @code{GNUNET_STATISTICS_set()}.
This function takes the name of the statistic and its value and a flag to make
the statistic persistent. The value of the statistic should be of the type
@code{uint64_t}. The function does not take the name of the subsystem; it is
determined from the previous @code{GNUNET_STATISTICS_create()} invocation. If
the given statistic is already present, its value is overwritten.

An existing statistics can be updated, i.e its value can be increased or
decreased by an amount with the function @code{GNUNET_STATISTICS_update()}. The
parameters to this function are similar to @code{GNUNET_STATISTICS_set()},
except that it takes the amount to be changed as a type @code{int64_t} instead
of the value.

The library will combine multiple set or update operations into one message if
the client performs requests at a rate that is faster than the available IPC
with the STATISTICS service. Thus, the client does not have to worry about
sending requests too quickly.

@node Watches
@subsubsection Watches

@c %**end of header

As interesting feature of STATISTICS lies in serving notifications whenever a
statistic of our interest is modified. This is achieved by registering a watch
through the function @code{GNUNET_STATISTICS_watch()}. The parameters of this
function are similar to those of @code{GNUNET_STATISTICS_get()}. Changes to the
respective statistic's value will then cause the given iterator callback to be
called. Note: A watch can only be registered for a specific statistic. Hence
the subsystem name and the parameter name cannot be @code{NULL} in a call to
@code{GNUNET_STATISTICS_watch()}.

A registered watch will keep notifying any value changes until
@code{GNUNET_STATISTICS_watch_cancel()} is called with the same parameters that
are used for registering the watch.

@node The STATISTICS Client-Service Protocol
@subsection The STATISTICS Client-Service Protocol
@c %**end of header


@menu
* Statistics retrieval2::
* Setting and updating statistics::
* Watching for updates::
@end menu

@node Statistics retrieval2
@subsubsection Statistics retrieval2

@c %**end of header

To retrieve statistics, the client transmits a message of type
@code{GNUNET_MESSAGE_TYPE_STATISTICS_GET} containing the given subsystem name
and statistic parameter to the STATISTICS service. The service responds with a
message of type @code{GNUNET_MESSAGE_TYPE_STATISTICS_VALUE} for each of the
statistics parameters that match the client request for the client. The end of
information retrieved is signaled by the service by sending a message of type
@code{GNUNET_MESSAGE_TYPE_STATISTICS_END}.

@node Setting and updating statistics
@subsubsection Setting and updating statistics

@c %**end of header

The subsystem name, parameter name, its value and the persistence flag are
communicated to the service through the message
@code{GNUNET_MESSAGE_TYPE_STATISTICS_SET}.

When the service receives a message of type
@code{GNUNET_MESSAGE_TYPE_STATISTICS_SET}, it retrieves the subsystem name and
checks for a statistic parameter with matching the name given in the message.
If a statistic parameter is found, the value is overwritten by the new value
from the message; if not found then a new statistic parameter is created with
the given name and value.

In addition to just setting an absolute value, it is possible to perform a
relative update by sending a message of type
@code{GNUNET_MESSAGE_TYPE_STATISTICS_SET} with an update flag
(@code{GNUNET_STATISTICS_SETFLAG_RELATIVE}) signifying that the value in the
message should be treated as an update value.

@node Watching for updates
@subsubsection Watching for updates

@c %**end of header

The function registers the watch at the service by sending a message of type
@code{GNUNET_MESSAGE_TYPE_STATISTICS_WATCH}. The service then sends
notifications through messages of type
@code{GNUNET_MESSAGE_TYPE_STATISTICS_WATCH_VALUE} whenever the statistic
parameter's value is changed.

@node GNUnet's Distributed Hash Table (DHT)
@section GNUnet's Distributed Hash Table (DHT)

@c %**end of header

GNUnet includes a generic distributed hash table that can be used by developers
building P2P applications in the framework. This section documents high-level
features and how developers are expected to use the DHT. We have a research
paper detailing how the DHT works. Also, Nate's thesis includes a detailed
description and performance analysis (in chapter 6).

Key features of GNUnet's DHT include:

@itemize @bullet
@item stores key-value pairs with values up to (approximately) 63k in size
@item works with many underlay network topologies (small-world, random graph),
underlay does not need to be a full mesh / clique
@item support for extended queries (more than just a simple 'key'), filtering
duplicate replies within the network (bloomfilter) and content validation (for
details, please read the subsection on the block library)
@item can (optionally) return paths taken by the PUT and GET operations to the
application
@item provides content replication to handle churn
@end itemize

GNUnet's DHT is randomized and unreliable. Unreliable means that there is no
strict guarantee that a value stored in the DHT is always found --- values are
only found with high probability. While this is somewhat true in all P2P DHTs,
GNUnet developers should be particularly wary of this fact (this will help you
write secure, fault-tolerant code). Thus, when writing any application using
the DHT, you should always consider the possibility that a value stored in the
DHT by you or some other peer might simply not be returned, or returned with a
significant delay. Your application logic must be written to tolerate this
(naturally, some loss of performance or quality of service is expected in this
case).

@menu
* Block library and plugins::
* libgnunetdht::
* The DHT Client-Service Protocol::
* The DHT Peer-to-Peer Protocol::
@end menu

@node Block library and plugins
@subsection Block library and plugins

@c %**end of header

@menu
* What is a Block?::
* The API of libgnunetblock::
* Queries::
* Sample Code::
* Conclusion2::
@end menu

@node What is a Block?
@subsubsection What is a Block?

@c %**end of header

Blocks are small (< 63k) pieces of data stored under a key (struct
GNUNET_HashCode). Blocks have a type (enum GNUNET_BlockType) which defines
their data format. Blocks are used in GNUnet as units of static data exchanged
between peers and stored (or cached) locally. Uses of blocks include
file-sharing (the files are broken up into blocks), the VPN (DNS information is
stored in blocks) and the DHT (all information in the DHT and meta-information
for the maintenance of the DHT are both stored using blocks). The block
subsystem provides a few common functions that must be available for any type
of block.

@node The API of libgnunetblock
@subsubsection The API of libgnunetblock

@c %**end of header

The block library requires for each (family of) block type(s) a block plugin
(implementing gnunet_block_plugin.h) that provides basic functions that are
needed by the DHT (and possibly other subsystems) to manage the block. These
block plugins are typically implemented within their respective subsystems.@
The main block library is then used to locate, load and query the appropriate
block plugin. Which plugin is appropriate is determined by the block type
(which is just a 32-bit integer). Block plugins contain code that specifies
which block types are supported by a given plugin. The block library loads all
block plugins that are installed at the local peer and forwards the application
request to the respective plugin.

The central functions of the block APIs (plugin and main library) are to allow
the mapping of blocks to their respective key (if possible) and the ability to
check that a block is well-formed and matches a given request (again, if
possible). This way, GNUnet can avoid storing invalid blocks, storing blocks
under the wrong key and forwarding blocks in response to a query that they do
not answer.

One key function of block plugins is that it allows GNUnet to detect duplicate
replies (via the Bloom filter). All plugins MUST support detecting duplicate
replies (by adding the current response to the Bloom filter and rejecting it if
it is encountered again). If a plugin fails to do this, responses may loop in
the network.

@node Queries
@subsubsection Queries
@c %**end of header

The query format for any block in GNUnet consists of four main components.
First, the type of the desired block must be specified. Second, the query must
contain a hash code. The hash code is used for lookups in hash tables and
databases and must not be unique for the block (however, if possible a unique
hash should be used as this would be best for performance). Third, an optional
Bloom filter can be specified to exclude known results; replies that hash to
the bits set in the Bloom filter are considered invalid. False-positives can be
eliminated by sending the same query again with a different Bloom filter
mutator value, which parameterizes the hash function that is used. Finally, an
optional application-specific "eXtended query" (xquery) can be specified to
further constrain the results. It is entirely up to the type-specific plugin to
determine whether or not a given block matches a query (type, hash, Bloom
filter, and xquery). Naturally, not all xquery's are valid and some types of
blocks may not support Bloom filters either, so the plugin also needs to check
if the query is valid in the first place.

Depending on the results from the plugin, the DHT will then discard the
(invalid) query, forward the query, discard the (invalid) reply, cache the
(valid) reply, and/or forward the (valid and non-duplicate) reply.

@node Sample Code
@subsubsection Sample Code

@c %**end of header

The source code in @strong{plugin_block_test.c} is a good starting point for
new block plugins --- it does the minimal work by implementing a plugin that
performs no validation at all. The respective @strong{Makefile.am} shows how to
build and install a block plugin.

@node Conclusion2
@subsubsection Conclusion2

@c %**end of header

In conclusion, GNUnet subsystems that want to use the DHT need to define a
block format and write a plugin to match queries and replies. For testing, the
"GNUNET_BLOCK_TYPE_TEST" block type can be used; it accepts any query as valid
and any reply as matching any query. This type is also used for the DHT command
line tools. However, it should NOT be used for normal applications due to the
lack of error checking that results from this primitive implementation.

@node libgnunetdht
@subsection libgnunetdht

@c %**end of header

The DHT API itself is pretty simple and offers the usual GET and PUT functions
that work as expected. The specified block type refers to the block library
which allows the DHT to run application-specific logic for data stored in the
network.


@menu
* GET::
* PUT::
* MONITOR::
* DHT Routing Options::
@end menu

@node GET
@subsubsection GET

@c %**end of header

When using GET, the main consideration for developers (other than the block
library) should be that after issuing a GET, the DHT will continuously cause
(small amounts of) network traffic until the operation is explicitly canceled.
So GET does not simply send out a single network request once; instead, the
DHT will continue to search for data. This is needed to achieve good success
rates and also handles the case where the respective PUT operation happens
after the GET operation was started. Developers should not cancel an existing
GET operation and then explicitly re-start it to trigger a new round of
network requests; this is simply inefficient, especially as the internal
automated version can be more efficient, for example by filtering results in
the network that have already been returned.

If an application that performs a GET request has a set of replies that it
already knows and would like to filter, it can call@
@code{GNUNET_DHT_get_filter_known_results} with an array of hashes over the
respective blocks to tell the DHT that these results are not desired (any
more). This way, the DHT will filter the respective blocks using the block
library in the network, which may result in a significant reduction in
bandwidth consumption.

@node PUT
@subsubsection PUT

@c %**end of header

In contrast to GET operations, developers @strong{must} manually re-run PUT
operations periodically (if they intend the content to continue to be
available). Content stored in the DHT expires or might be lost due to churn.
Furthermore, GNUnet's DHT typically requires multiple rounds of PUT operations
before a key-value pair is consistently available to all peers (the DHT
randomizes paths and thus storage locations, and only after multiple rounds of
PUTs there will be a sufficient number of replicas in large DHTs). An explicit
PUT operation using the DHT API will only cause network traffic once, so in
order to ensure basic availability and resistance to churn (and adversaries),
PUTs must be repeated. While the exact frequency depends on the application, a
rule of thumb is that there should be at least a dozen PUT operations within
the content lifetime. Content in the DHT typically expires after one day, so
DHT PUT operations should be repeated at least every 1-2 hours.

@node MONITOR
@subsubsection MONITOR

@c %**end of header

The DHT API also allows applications to monitor messages crossing the local
DHT service. The types of messages used by the DHT are GET, PUT and RESULT
messages. Using the monitoring API, applications can choose to monitor these
requests, possibly limiting themselves to requests for a particular block
type.

The monitoring API is not only usefu only for diagnostics, it can also be used
to trigger application operations based on PUT operations. For example, an
application may use PUTs to distribute work requests to other peers. The
workers would then monitor for PUTs that give them work, instead of looking
for work using GET operations. This can be beneficial, especially if the
workers have no good way to guess the keys under which work would be stored.
Naturally, additional protocols might be needed to ensure that the desired
number of workers will process the distributed workload.

@node DHT Routing Options
@subsubsection DHT Routing Options

@c %**end of header

There are two important options for GET and PUT requests:

@table @asis
@item GNUNET_DHT_RO_DEMULITPLEX_EVERYWHERE This option means that all peers
should process the request, even if their peer ID is not closest to the key.
For a PUT request, this means that all peers that a request traverses may make
a copy of the data. Similarly for a GET request, all peers will check their
local database for a result. Setting this option can thus significantly improve
caching and reduce bandwidth consumption --- at the expense of a larger DHT
database. If in doubt, we recommend that this option should be used.
@item GNUNET_DHT_RO_RECORD_ROUTE This option instructs the DHT to record the path
that a GET or a PUT request is taking through the overlay network. The
resulting paths are then returned to the application with the respective
result. This allows the receiver of a result to construct a path to the
originator of the data, which might then be used for routing. Naturally,
setting this option requires additional bandwidth and disk space, so
applications should only set this if the paths are needed by the application
logic.
@item GNUNET_DHT_RO_FIND_PEER This option is an internal option used by
the DHT's peer discovery mechanism and should not be used by applications.
@item GNUNET_DHT_RO_BART This option is currently not implemented. It may in
the future offer performance improvements for clique topologies.
@end table

@node The DHT Client-Service Protocol
@subsection The DHT Client-Service Protocol

@c %**end of header

@menu
* PUTting data into the DHT::
* GETting data from the DHT::
* Monitoring the DHT::
@end menu

@node PUTting data into the DHT
@subsubsection PUTting data into the DHT

@c %**end of header

To store (PUT) data into the DHT, the client sends a@ @code{struct
GNUNET_DHT_ClientPutMessage} to the service. This message specifies the block
type, routing options, the desired replication level, the expiration time, key,
value and a 64-bit unique ID for the operation. The service responds with a@
@code{struct GNUNET_DHT_ClientPutConfirmationMessage} with the same 64-bit
unique ID. Note that the service sends the confirmation as soon as it has
locally processed the PUT request. The PUT may still be propagating through the
network at this time.

In the future, we may want to change this to provide (limited) feedback to the
client, for example if we detect that the PUT operation had no effect because
the same key-value pair was already stored in the DHT. However, changing this
would also require additional state and messages in the P2P
interaction.

@node GETting data from the DHT
@subsubsection GETting data from the DHT

@c %**end of header

To retrieve (GET) data from the DHT, the client sends a@ @code{struct
GNUNET_DHT_ClientGetMessage} to the service. The message specifies routing
options, a replication level (for replicating the GET, not the content), the
desired block type, the key, the (optional) extended query and unique 64-bit
request ID.

Additionally, the client may send any number of@ @code{struct
GNUNET_DHT_ClientGetResultSeenMessage}s to notify the service about results
that the client is already aware of. These messages consist of the key, the
unique 64-bit ID of the request, and an arbitrary number of hash codes over the
blocks that the client is already aware of. As messages are restricted to 64k,
a client that already knows more than about a thousand blocks may need to send
several of these messages. Naturally, the client should transmit these messages
as quickly as possible after the original GET request such that the DHT can
filter those results in the network early on. Naturally, as these messages are
send after the original request, it is conceivalbe that the DHT service may
return blocks that match those already known to the client anyway.

In response to a GET request, the service will send @code{struct
GNUNET_DHT_ClientResultMessage}s to the client. These messages contain the
block type, expiration, key, unique ID of the request and of course the value
(a block). Depending on the options set for the respective operations, the
replies may also contain the path the GET and/or the PUT took through the
network.

A client can stop receiving replies either by disconnecting or by sending a
@code{struct GNUNET_DHT_ClientGetStopMessage} which must contain the key and
the 64-bit unique ID of the original request. Using an explicit "stop" message
is more common as this allows a client to run many concurrent GET operations
over the same connection with the DHT service --- and to stop them
individually.

@node Monitoring the DHT
@subsubsection Monitoring the DHT

@c %**end of header

To begin monitoring, the client sends a @code{struct
GNUNET_DHT_MonitorStartStop} message to the DHT service. In this message, flags
can be set to enable (or disable) monitoring of GET, PUT and RESULT messages
that pass through a peer. The message can also restrict monitoring to a
particular block type or a particular key. Once monitoring is enabled, the DHT
service will notify the client about any matching event using @code{struct
GNUNET_DHT_MonitorGetMessage}s for GET events, @code{struct
GNUNET_DHT_MonitorPutMessage} for PUT events and@ @code{struct
GNUNET_DHT_MonitorGetRespMessage} for RESULTs. Each of these messages contains
all of the information about the event.

@node The DHT Peer-to-Peer Protocol
@subsection The DHT Peer-to-Peer Protocol
@c %**end of header


@menu
* Routing GETs or PUTs::
* PUTting data into the DHT2::
* GETting data from the DHT2::
@end menu

@node Routing GETs or PUTs
@subsubsection Routing GETs or PUTs

@c %**end of header

When routing GETs or PUTs, the DHT service selects a suitable subset of
neighbours for forwarding. The exact number of neighbours can be zero or more
and depends on the hop counter of the query (initially zero) in relation to the
(log of) the network size estimate, the desired replication level and the
peer's connectivity. Depending on the hop counter and our network size
estimate, the selection of the peers maybe randomized or by proximity to the
key. Furthermore, requests include a set of peers that a request has already
traversed; those peers are also excluded from the selection.

@node PUTting data into the DHT2
@subsubsection PUTting data into the DHT2

@c %**end of header

To PUT data into the DHT, the service sends a @code{struct PeerPutMessage} of
type @code{GNUNET_MESSAGE_TYPE_DHT_P2P_PUT} to the respective neighbour. In
addition to the usual information about the content (type, routing options,
desired replication level for the content, expiration time, key and value), the
message contains a fixed-size Bloom filter with information about which peers
(may) have already seen this request. This Bloom filter is used to ensure that
DHT messages never loop back to a peer that has already processed the request.
Additionally, the message includes the current hop counter and, depending on
the routing options, the message may include the full path that the message has
taken so far. The Bloom filter should already contain the identity of the
previous hop; however, the path should not include the identity of the previous
hop and the receiver should append the identity of the sender to the path, not
its own identity (this is done to reduce bandwidth).

@node GETting data from the DHT2
@subsubsection GETting data from the DHT2

@c %**end of header

A peer can search the DHT by sending @code{struct PeerGetMessage}s of type
@code{GNUNET_MESSAGE_TYPE_DHT_P2P_GET} to other peers. In addition to the usual
information about the request (type, routing options, desired replication level
for the request, the key and the extended query), a GET request also again
contains a hop counter, a Bloom filter over the peers that have processed the
request already and depending on the routing options the full path traversed by
the GET. Finally, a GET request includes a variable-size second Bloom filter
and a so-called Bloom filter mutator value which together indicate which
replies the sender has already seen. During the lookup, each block that matches
they block type, key and extended query is additionally subjected to a test
against this Bloom filter. The block plugin is expected to take the hash of the
block and combine it with the mutator value and check if the result is not yet
in the Bloom filter. The originator of the query will from time to time modify
the mutator to (eventually) allow false-positives filtered by the Bloom filter
to be returned.

Peers that receive a GET request perform a local lookup (depending on their
proximity to the key and the query options) and forward the request to other
peers. They then remember the request (including the Bloom filter for blocking
duplicate results) and when they obtain a matching, non-filtered response a
@code{struct PeerResultMessage} of type@
@code{GNUNET_MESSAGE_TYPE_DHT_P2P_RESULT} is forwarded to the previous hop.
Whenver a result is forwarded, the block plugin is used to update the Bloom
filter accordingly, to ensure that the same result is never forwarded more than
once. The DHT service may also cache forwarded results locally if the
"CACHE_RESULTS" option is set to "YES" in the configuration.

@node The GNU Name System (GNS)
@section The GNU Name System (GNS)

@c %**end of header

The GNU Name System (GNS) is a decentralized database that enables users to
securely resolve names to values. Names can be used to identify other users
(for example, in social networking), or network services (for example, VPN
services running at a peer in GNUnet, or purely IP-based services on the
Internet). Users interact with GNS by typing in a hostname that ends in ".gnu"
or ".zkey".

Videos giving an overview of most of the GNS and the motivations behind it is
available here and here. The remainder of this chapter targets developers that
are familiar with high level concepts of GNS as presented in these talks.

GNS-aware applications should use the GNS resolver to obtain the respective
records that are stored under that name in GNS. Each record consists of a type,
value, expiration time and flags.

The type specifies the format of the value. Types below 65536 correspond to DNS
record types, larger values are used for GNS-specific records. Applications can
define new GNS record types by reserving a number and implementing a plugin
(which mostly needs to convert the binary value representation to a
human-readable text format and vice-versa). The expiration time specifies how
long the record is to be valid. The GNS API ensures that applications are only
given non-expired values. The flags are typically irrelevant for applications,
as GNS uses them internally to control visibility and validity of records.

Records are stored along with a signature. The signature is generated using the
private key of the authoritative zone. This allows any GNS resolver to verify
the correctness of a name-value mapping.

Internally, GNS uses the NAMECACHE to cache information obtained from other
users, the NAMESTORE to store information specific to the local users, and the
DHT to exchange data between users. A plugin API is used to enable applications
to define new GNS record types.

@menu
* libgnunetgns::
* libgnunetgnsrecord::
* GNS plugins::
* The GNS Client-Service Protocol::
* Hijacking the DNS-Traffic using gnunet-service-dns::
* Serving DNS lookups via GNS on W32::
@end menu

@node libgnunetgns
@subsection libgnunetgns

@c %**end of header

The GNS API itself is extremely simple. Clients first connec to the GNS service
using @code{GNUNET_GNS_connect}. They can then perform lookups using
@code{GNUNET_GNS_lookup} or cancel pending lookups using
@code{GNUNET_GNS_lookup_cancel}. Once finished, clients disconnect using
@code{GNUNET_GNS_disconnect}.


@menu
* Looking up records::
* Accessing the records::
* Creating records::
* Future work::
@end menu

@node Looking up records
@subsubsection Looking up records

@c %**end of header

@code{GNUNET_GNS_lookup} takes a number of arguments:

@table @asis
@item handle This is simply the GNS connection handle from
@code{GNUNET_GNS_connect}.
@item name The client needs to specify the name to
be resolved. This can be any valid DNS or GNS hostname.
@item zone The client
needs to specify the public key of the GNS zone against which the resolution
should be done (the ".gnu" zone). Note that a key must be provided, even if the
name ends in ".zkey". This should typically be the public key of the
master-zone of the user.
@item type This is the desired GNS or DNS record type
to look for. While all records for the given name will be returned, this can be
important if the client wants to resolve record types that themselves delegate
resolution, such as CNAME, PKEY or GNS2DNS. Resolving a record of any of these
types will only work if the respective record type is specified in the request,
as the GNS resolver will otherwise follow the delegation and return the records
from the respective destination, instead of the delegating record.
@item only_cached This argument should typically be set to @code{GNUNET_NO}. Setting
it to @code{GNUNET_YES} disables resolution via the overlay network.
@item shorten_zone_key If GNS encounters new names during resolution, their
respective zones can automatically be learned and added to the "shorten zone".
If this is desired, clients must pass the private key of the shorten zone. If
NULL is passed, shortening is disabled.
@item proc This argument identifies
the function to call with the result. It is given proc_cls, the number of
records found (possilby zero) and the array of the records as arguments. proc
will only be called once. After proc,> has been called, the lookup must no
longer be cancelled.
@item proc_cls The closure for proc.
@end table

@node Accessing the records
@subsubsection Accessing the records

@c %**end of header

The @code{libgnunetgnsrecord} library provides an API to manipulate the GNS
record array that is given to proc. In particular, it offers functions such as
converting record values to human-readable strings (and back). However, most
@code{libgnunetgnsrecord} functions are not interesting to GNS client
applications.

For DNS records, the @code{libgnunetdnsparser} library provides functions for
parsing (and serializing) common types of DNS records.

@node Creating records
@subsubsection Creating records

@c %**end of header

Creating GNS records is typically done by building the respective record
information (possibly with the help of @code{libgnunetgnsrecord} and
@code{libgnunetdnsparser}) and then using the @code{libgnunetnamestore} to
publish the information. The GNS API is not involved in this
operation.

@node Future work
@subsubsection Future work

@c %**end of header

In the future, we want to expand @code{libgnunetgns} to allow applications to
observe shortening operations performed during GNS resolution, for example so
that users can receive visual feedback when this happens.

@node libgnunetgnsrecord
@subsection libgnunetgnsrecord

@c %**end of header

The @code{libgnunetgnsrecord} library is used to manipulate GNS records (in
plaintext or in their encrypted format). Applications mostly interact with
@code{libgnunetgnsrecord} by using the functions to convert GNS record values
to strings or vice-versa, or to lookup a GNS record type number by name (or
vice-versa). The library also provides various other functions that are mostly
used internally within GNS, such as converting keys to names, checking for
expiration, encrypting GNS records to GNS blocks, verifying GNS block
signatures and decrypting GNS records from GNS blocks.

We will now discuss the four commonly used functions of the API.@
@code{libgnunetgnsrecord} does not perform these operations itself, but instead
uses plugins to perform the operation. GNUnet includes plugins to support
common DNS record types as well as standard GNS record types.


@menu
* Value handling::
* Type handling::
@end menu

@node Value handling
@subsubsection Value handling

@c %**end of header

@code{GNUNET_GNSRECORD_value_to_string} can be used to convert the (binary)
representation of a GNS record value to a human readable, 0-terminated UTF-8
string. NULL is returned if the specified record type is not supported by any
available plugin.

@code{GNUNET_GNSRECORD_string_to_value} can be used to try to convert a human
readable string to the respective (binary) representation of a GNS record
value.

@node Type handling
@subsubsection Type handling

@c %**end of header

@code{GNUNET_GNSRECORD_typename_to_number} can be used to obtain the numeric
value associated with a given typename. For example, given the typename "A"
(for DNS A reocrds), the function will return the number 1. A list of common
DNS record types is
@uref{http://en.wikipedia.org/wiki/List_of_DNS_record_types, here. Note that
not all DNS record types are supported by GNUnet GNSRECORD plugins at this
time.}

@code{GNUNET_GNSRECORD_number_to_typename} can be used to obtain the typename
associated with a given numeric value. For example, given the type number 1,
the function will return the typename "A".

@node GNS plugins
@subsection GNS plugins

@c %**end of header

Adding a new GNS record type typically involves writing (or extending) a
GNSRECORD plugin. The plugin needs to implement the
@code{gnunet_gnsrecord_plugin.h} API which provides basic functions that are
needed by GNSRECORD to convert typenames and values of the respective record
type to strings (and back). These gnsrecord plugins are typically implemented
within their respective subsystems. Examples for such plugins can be found in
the GNSRECORD, GNS and CONVERSATION subsystems.

The @code{libgnunetgnsrecord} library is then used to locate, load and query
the appropriate gnsrecord plugin. Which plugin is appropriate is determined by
the record type (which is just a 32-bit integer). The @code{libgnunetgnsrecord}
library loads all block plugins that are installed at the local peer and
forwards the application request to the plugins. If the record type is not
supported by the plugin, it should simply return an error code.

The central functions of the block APIs (plugin and main library) are the same
four functions for converting between values and strings, and typenames and
numbers documented in the previous subsection.

@node The GNS Client-Service Protocol
@subsection The GNS Client-Service Protocol

@c %**end of header

The GNS client-service protocol consists of two simple messages, the
@code{LOOKUP} message and the @code{LOOKUP_RESULT}. Each @code{LOOKUP} message
contains a unique 32-bit identifier, which will be included in the
corresponding response. Thus, clients can send many lookup requests in parallel
and receive responses out-of-order. A @code{LOOKUP} request also includes the
public key of the GNS zone, the desired record type and fields specifying
whether shortening is enabled or networking is disabled. Finally, the
@code{LOOKUP} message includes the name to be resolved.

The response includes the number of records and the records themselves in the
format created by @code{GNUNET_GNSRECORD_records_serialize}. They can thus be
deserialized using @code{GNUNET_GNSRECORD_records_deserialize}.

@node Hijacking the DNS-Traffic using gnunet-service-dns
@subsection Hijacking the DNS-Traffic using gnunet-service-dns

@c %**end of header

This section documents how the gnunet-service-dns (and the gnunet-helper-dns)
intercepts DNS queries from the local system.@ This is merely one method for
how we can obtain GNS queries. It is also possible to change @code{resolv.conf}
to point to a machine running @code{gnunet-dns2gns} or to modify libc's name
system switch (NSS) configuration to include a GNS resolution plugin. The
method described in this chaper is more of a last-ditch catch-all approach.

@code{gnunet-service-dns} enables intercepting DNS traffic using policy based
routing. We MARK every outgoing DNS-packet if it was not sent by our
application. Using a second routing table in the Linux kernel these marked
packets are then routed through our virtual network interface and can thus be
captured unchanged.

Our application then reads the query and decides how to handle it: A query to
an address ending in ".gnu" or ".zkey" is hijacked by @code{gnunet-service-gns}
and resolved internally using GNS. In the future, a reverse query for an
address of the configured virtual network could be answered with records kept
about previous forward queries. Queries that are not hijacked by some
application using the DNS service will be sent to the original recipient. The
answer to the query will always be sent back through the virtual interface with
the original nameserver as source address.


@menu
* Network Setup Details::
@end menu

@node Network Setup Details
@subsubsection Network Setup Details

@c %**end of header

The DNS interceptor adds the following rules to the Linux kernel:
@example
iptables -t mangle -I OUTPUT 1 -p udp --sport $LOCALPORT --dport 53 -j
ACCEPT iptables -t mangle -I OUTPUT 2 -p udp --dport 53 -j MARK --set-mark 3 ip
rule add fwmark 3 table2 ip route add default via $VIRTUALDNS table2
@end example

Line 1 makes sure that all packets coming from a port our application opened
beforehand (@code{$LOCALPORT}) will be routed normally. Line 2 marks every
other packet to a DNS-Server with mark 3 (chosen arbitrarily). The third line
adds a routing policy based on this mark 3 via the routing table.

@node Serving DNS lookups via GNS on W32
@subsection Serving DNS lookups via GNS on W32

@c %**end of header

This section documents how the libw32nsp (and gnunet-gns-helper-service-w32) do
DNS resolutions of DNS queries on the local system. This only applies to GNUnet
running on W32.

W32 has a concept of "Namespaces" and "Namespace providers". These are used to
present various name systems to applications in a generic way. Namespaces
include DNS, mDNS, NLA and others. For each namespace any number of providers
could be registered, and they are queried in an order of priority (which is
adjustable).

Applications can resolve names by using WSALookupService*() family of
functions.

However, these are WSA-only facilities. Common BSD socket functions for
namespace resolutions are gethostbyname and getaddrinfo (among others). These
functions are implemented internally (by default - by mswsock, which also
implements the default DNS provider) as wrappers around WSALookupService*()
functions (see "Sample Code for a Service Provider" on MSDN).

On W32 GNUnet builds a libw32nsp - a namespace provider, which can then be
installed into the system by using w32nsp-install (and uninstalled by
w32nsp-uninstall), as described in "Installation Handbook".

libw32nsp is very simple and has almost no dependencies. As a response to
NSPLookupServiceBegin(), it only checks that the provider GUID passed to it by
the caller matches GNUnet DNS Provider GUID, checks that name being resolved
ends in ".gnu" or ".zkey", then connects to gnunet-gns-helper-service-w32 at
127.0.0.1:5353 (hardcoded) and sends the name resolution request there,
returning the connected socket to the caller.

When the caller invokes NSPLookupServiceNext(), libw32nsp reads a completely
formed reply from that socket, unmarshalls it, then gives it back to the
caller.

At the moment gnunet-gns-helper-service-w32 is implemented to ever give only
one reply, and subsequent calls to NSPLookupServiceNext() will fail with
WSA_NODATA (first call to NSPLookupServiceNext() might also fail if GNS failed
to find the name, or there was an error connecting to it).

gnunet-gns-helper-service-w32 does most of the processing:

@itemize @bullet
@item Maintains a connection to GNS.
@item Reads GNS config and loads appropriate keys.
@item Checks service GUID and decides on the type of record to look up,
refusing to make a lookup outright when unsupported service GUID is passed.
@item Launches the lookup
@end itemize

When lookup result arrives, gnunet-gns-helper-service-w32 forms a complete
reply (including filling a WSAQUERYSETW structure and, possibly, a binary blob
with a hostent structure for gethostbyname() client), marshalls it, and sends
it back to libw32nsp. If no records were found, it sends an empty header.

This works for most normal applications that use gethostbyname() or
getaddrinfo() to resolve names, but fails to do anything with applications that
use alternative means of resolving names (such as sending queries to a DNS
server directly by themselves). This includes some of well known utilities,
like "ping" and "nslookup".

@node The GNS Namecache
@section The GNS Namecache

@c %**end of header

The NAMECACHE subsystem is responsible for caching (encrypted) resolution
results of the GNU Name System (GNS). GNS makes zone information available to
other users via the DHT. However, as accessing the DHT for every lookup is
expensive (and as the DHT's local cache is lost whenever the peer is
restarted), GNS uses the NAMECACHE as a more persistent cache for DHT lookups.
Thus, instead of always looking up every name in the DHT, GNS first checks if
the result is already available locally in the NAMECACHE. Only if there is no
result in the NAMECACHE, GNS queries the DHT. The NAMECACHE stores data in the
same (encrypted) format as the DHT. It thus makes no sense to iterate over all
items in the NAMECACHE --- the NAMECACHE does not have a way to provide the
keys required to decrypt the entries.

Blocks in the NAMECACHE share the same expiration mechanism as blocks in the
DHT --- the block expires wheneever any of the records in the (encrypted) block
expires. The expiration time of the block is the only information stored in
plaintext. The NAMECACHE service internally performs all of the required work
to expire blocks, clients do not have to worry about this. Also, given that
NAMECACHE stores only GNS blocks that local users requested, there is no
configuration option to limit the size of the NAMECACHE. It is assumed to be
always small enough (a few MB) to fit on the drive.

The NAMECACHE supports the use of different database backends via a plugin API.

@menu
* libgnunetnamecache::
* The NAMECACHE Client-Service Protocol::
* The NAMECACHE Plugin API::
@end menu

@node libgnunetnamecache
@subsection libgnunetnamecache

@c %**end of header

The NAMECACHE API consists of five simple functions. First, there is
@code{GNUNET_NAMECACHE_connect} to connect to the NAMECACHE service. This
returns the handle required for all other operations on the NAMECACHE. Using
@code{GNUNET_NAMECACHE_block_cache} clients can insert a block into the cache.
@code{GNUNET_NAMECACHE_lookup_block} can be used to lookup blocks that were
stored in the NAMECACHE. Both operations can be cancelled using
@code{GNUNET_NAMECACHE_cancel}. Note that cancelling a
@code{GNUNET_NAMECACHE_block_cache} operation can result in the block being
stored in the NAMECACHE --- or not. Cancellation primarily ensures that the
continuation function with the result of the operation will no longer be
invoked. Finally, @code{GNUNET_NAMECACHE_disconnect} closes the connection to
the NAMECACHE.

The maximum size of a block that can be stored in the NAMECACHE is
@code{GNUNET_NAMECACHE_MAX_VALUE_SIZE}, which is defined to be 63 kB.

@node The NAMECACHE Client-Service Protocol
@subsection The NAMECACHE Client-Service Protocol

@c %**end of header

All messages in the NAMECACHE IPC protocol start with the @code{struct
GNUNET_NAMECACHE_Header} which adds a request ID (32-bit integer) to the
standard message header. The request ID is used to match requests with the
respective responses from the NAMECACHE, as they are allowed to happen
out-of-order.


@menu
* Lookup::
* Store::
@end menu

@node Lookup
@subsubsection Lookup

@c %**end of header

The @code{struct LookupBlockMessage} is used to lookup a block stored in the
cache. It contains the query hash. The NAMECACHE always responds with a
@code{struct LookupBlockResponseMessage}. If the NAMECACHE has no response, it
sets the expiration time in the response to zero. Otherwise, the response is
expected to contain the expiration time, the ECDSA signature, the derived key
and the (variable-size) encrypted data of the block.

@node Store
@subsubsection Store

@c %**end of header

The @code{struct BlockCacheMessage} is used to cache a block in the NAMECACHE.
It has the same structure as the @code{struct LookupBlockResponseMessage}. The
service responds with a @code{struct BlockCacheResponseMessage} which contains
the result of the operation (success or failure). In the future, we might want
to make it possible to provide an error message as well.

@node The NAMECACHE Plugin API
@subsection The NAMECACHE Plugin API
@c %**end of header

The NAMECACHE plugin API consists of two functions, @code{cache_block} to store
a block in the database, and @code{lookup_block} to lookup a block in the
database.


@menu
* Lookup2::
* Store2::
@end menu

@node Lookup2
@subsubsection Lookup2

@c %**end of header

The @code{lookup_block} function is expected to return at most one block to the
iterator, and return @code{GNUNET_NO} if there were no non-expired results. If
there are multiple non-expired results in the cache, the lookup is supposed to
return the result with the largest expiration time.

@node Store2
@subsubsection Store2

@c %**end of header

The @code{cache_block} function is expected to try to store the block in the
database, and return @code{GNUNET_SYSERR} if this was not possible for any
reason. Furthermore, @code{cache_block} is expected to implicitly perform cache
maintenance and purge blocks from the cache that have expired. Note that
@code{cache_block} might encounter the case where the database already has
another block stored under the same key. In this case, the plugin must ensure
that the block with the larger expiration time is preserved. Obviously, this
can done either by simply adding new blocks and selecting for the most recent
expiration time during lookup, or by checking which block is more recent during
the store operation.

@node The REVOCATION Subsystem
@section The REVOCATION Subsystem
@c %**end of header

The REVOCATION subsystem is responsible for key revocation of Egos. If a user
learns that his private key has been compromised or has lost it, he can use the
REVOCATION system to inform all of the other users that this private key is no
longer valid. The subsystem thus includes ways to query for the validity of
keys and to propagate revocation messages.

@menu
* Dissemination::
* Revocation Message Design Requirements::
* libgnunetrevocation::
* The REVOCATION Client-Service Protocol::
* The REVOCATION Peer-to-Peer Protocol::
@end menu

@node Dissemination
@subsection Dissemination

@c %**end of header

When a revocation is performed, the revocation is first of all disseminated by
flooding the overlay network. The goal is to reach every peer, so that when a
peer needs to check if a key has been revoked, this will be purely a local
operation where the peer looks at his local revocation list. Flooding the
network is also the most robust form of key revocation --- an adversary would
have to control a separator of the overlay graph to restrict the propagation of
the revocation message. Flooding is also very easy to implement --- peers that
receive a revocation message for a key that they have never seen before simply
pass the message to all of their neighbours.

Flooding can only distribute the revocation message to peers that are online.
In order to notify peers that join the network later, the revocation service
performs efficient set reconciliation over the sets of known revocation
messages whenever two peers (that both support REVOCATION dissemination)
connect. The SET service is used to perform this operation
efficiently.

@node Revocation Message Design Requirements
@subsection Revocation Message Design Requirements

@c %**end of header

However, flooding is also quite costly, creating O(|E|) messages on a network
with |E| edges. Thus, revocation messages are required to contain a
proof-of-work, the result of an expensive computation (which, however, is cheap
to verify). Only peers that have expended the CPU time necessary to provide
this proof will be able to flood the network with the revocation message. This
ensures that an attacker cannot simply flood the network with millions of
revocation messages. The proof-of-work required by GNUnet is set to take days
on a typical PC to compute; if the ability to quickly revoke a key is needed,
users have the option to pre-compute revocation messages to store off-line and
use instantly after their key has expired.

Revocation messages must also be signed by the private key that is being
revoked. Thus, they can only be created while the private key is in the
possession of the respective user. This is another reason to create a
revocation message ahead of time and store it in a secure location.

@node libgnunetrevocation
@subsection libgnunetrevocation

@c %**end of header

The REVOCATION API consists of two parts, to query and to issue
revocations.


@menu
* Querying for revoked keys::
* Preparing revocations::
* Issuing revocations::
@end menu

@node Querying for revoked keys
@subsubsection Querying for revoked keys

@c %**end of header

@code{GNUNET_REVOCATION_query} is used to check if a given ECDSA public key has
been revoked. The given callback will be invoked with the result of the check.
The query can be cancelled using @code{GNUNET_REVOCATION_query_cancel} on the
return value.

@node Preparing revocations
@subsubsection Preparing revocations

@c %**end of header

It is often desirable to create a revocation record ahead-of-time and store it
in an off-line location to be used later in an emergency. This is particularly
true for GNUnet revocations, where performing the revocation operation itself
is computationally expensive and thus is likely to take some time. Thus, if
users want the ability to perform revocations quickly in an emergency, they
must pre-compute the revocation message. The revocation API enables this with
two functions that are used to compute the revocation message, but not trigger
the actual revocation operation.

@code{GNUNET_REVOCATION_check_pow} should be used to calculate the
proof-of-work required in the revocation message. This function takes the
public key, the required number of bits for the proof of work (which in GNUnet
is a network-wide constant) and finally a proof-of-work number as arguments.
The function then checks if the given proof-of-work number is a valid proof of
work for the given public key. Clients preparing a revocation are expected to
call this function repeatedly (typically with a monotonically increasing
sequence of numbers of the proof-of-work number) until a given number satisfies
the check. That number should then be saved for later use in the revocation
operation.

@code{GNUNET_REVOCATION_sign_revocation} is used to generate the signature that
is required in a revocation message. It takes the private key that (possibly in
the future) is to be revoked and returns the signature. The signature can again
be saved to disk for later use, which will then allow performing a revocation
even without access to the private key.

@node Issuing revocations
@subsubsection Issuing revocations


Given a ECDSA public key, the signature from @code{GNUNET_REVOCATION_sign} and
the proof-of-work, @code{GNUNET_REVOCATION_revoke} can be used to perform the
actual revocation. The given callback is called upon completion of the
operation. @code{GNUNET_REVOCATION_revoke_cancel} can be used to stop the
library from calling the continuation; however, in that case it is undefined
whether or not the revocation operation will be executed.

@node The REVOCATION Client-Service Protocol
@subsection The REVOCATION Client-Service Protocol


The REVOCATION protocol consists of four simple messages.

A @code{QueryMessage} containing a public ECDSA key is used to check if a
particular key has been revoked. The service responds with a
@code{QueryResponseMessage} which simply contains a bit that says if the given
public key is still valid, or if it has been revoked.

The second possible interaction is for a client to revoke a key by passing a
@code{RevokeMessage} to the service. The @code{RevokeMessage} contains the
ECDSA public key to be revoked, a signature by the corresponding private key
and the proof-of-work, The service responds with a
@code{RevocationResponseMessage} which can be used to indicate that the
@code{RevokeMessage} was invalid (i.e. proof of work incorrect), or otherwise
indicates that the revocation has been processed successfully.

@node The REVOCATION Peer-to-Peer Protocol
@subsection The REVOCATION Peer-to-Peer Protocol

@c %**end of header

Revocation uses two disjoint ways to spread revocation information among peers.
First of all, P2P gossip exchanged via CORE-level neighbours is used to quickly
spread revocations to all connected peers. Second, whenever two peers (that
both support revocations) connect, the SET service is used to compute the union
of the respective revocation sets.

In both cases, the exchanged messages are @code{RevokeMessage}s which contain
the public key that is being revoked, a matching ECDSA signature, and a
proof-of-work. Whenever a peer learns about a new revocation this way, it first
validates the signature and the proof-of-work, then stores it to disk
(typically to a file $GNUNET_DATA_HOME/revocation.dat) and finally spreads the
information to all directly connected neighbours.

For computing the union using the SET service, the peer with the smaller hashed
peer identity will connect (as a "client" in the two-party set protocol) to the
other peer after one second (to reduce traffic spikes on connect) and initiate
the computation of the set union. All revocation services use a common hash to
identify the SET operation over revocation sets.

The current implementation accepts revocation set union operations from all
peers at any time; however, well-behaved peers should only initiate this
operation once after establishing a connection to a peer with a larger hashed
peer identity.

@node GNUnet's File-sharing (FS) Subsystem
@section GNUnet's File-sharing (FS) Subsystem

@c %**end of header

This chapter describes the details of how the file-sharing service works. As
with all services, it is split into an API (libgnunetfs), the service process
(gnunet-service-fs) and user interface(s). The file-sharing service uses the
datastore service to store blocks and the DHT (and indirectly datacache) for
lookups for non-anonymous file-sharing.@ Furthermore, the file-sharing service
uses the block library (and the block fs plugin) for validation of DHT
operations.

In contrast to many other services, libgnunetfs is rather complex since the
client library includes a large number of high-level abstractions; this is
necessary since the Fs service itself largely only operates on the block level.
The FS library is responsible for providing a file-based abstraction to
applications, including directories, meta data, keyword search, verification,
and so on.

The method used by GNUnet to break large files into blocks and to use keyword
search is called the "Encoding for Censorship Resistant Sharing" (ECRS). ECRS
is largely implemented in the fs library; block validation is also reflected in
the block FS plugin and the FS service. ECRS on-demand encoding is implemented
in the FS service.

NOTE: The documentation in this chapter is quite incomplete.

@menu
* Encoding for Censorship-Resistant Sharing (ECRS)::
* File-sharing persistence directory structure::
@end menu

@node Encoding for Censorship-Resistant Sharing (ECRS)
@subsection Encoding for Censorship-Resistant Sharing (ECRS)

@c %**end of header

When GNUnet shares files, it uses a content encoding that is called ECRS, the
Encoding for Censorship-Resistant Sharing. Most of ECRS is described in the
(so far unpublished) research paper attached to this page. ECRS obsoletes the
previous ESED and ESED II encodings which were used in GNUnet before version
0.7.0.@ @ The rest of this page assumes that the reader is familiar with the
attached paper. What follows is a description of some minor extensions that
GNUnet makes over what is described in the paper. The reason why these
extensions are not in the paper is that we felt that they were obvious or
trivial extensions to the original scheme and thus did not warrant space in
the research report.


@menu
* Namespace Advertisements::
* KSBlocks::
@end menu

@node Namespace Advertisements
@subsubsection Namespace Advertisements

@c %**end of header

An @code{SBlock} with identifier â²all zerosâ² is a signed
advertisement for a namespace. This special @code{SBlock} contains metadata
describing the content of the namespace. Instead of the name of the identifier
for a potential update, it contains the identifier for the root of the
namespace. The URI should always be empty. The @code{SBlock} is signed with
the content provderâ²s RSA private key (just like any other SBlock). Peers
can search for @code{SBlock}s in order to find out more about a namespace.

@node KSBlocks
@subsubsection KSBlocks

@c %**end of header

GNUnet implements @code{KSBlocks} which are @code{KBlocks} that, instead of
encrypting a CHK and metadata, encrypt an @code{SBlock} instead. In other
words, @code{KSBlocks} enable GNUnet to find @code{SBlocks} using the global
keyword search. Usually the encrypted @code{SBlock} is a namespace
advertisement. The rationale behind @code{KSBlock}s and @code{SBlock}s is to
enable peers to discover namespaces via keyword searches, and, to associate
useful information with namespaces. When GNUnet finds @code{KSBlocks} during a
normal keyword search, it adds the information to an internal list of
discovered namespaces. Users looking for interesting namespaces can then
inspect this list, reducing the need for out-of-band discovery of namespaces.
Naturally, namespaces (or more specifically, namespace advertisements) can
also be referenced from directories, but @code{KSBlock}s should make it easier
to advertise namespaces for the owner of the pseudonym since they eliminate
the need to first create a directory.

Collections are also advertised using @code{KSBlock}s.

@table @asis
@item Attachment Size
@item  ecrs.pdf 270.68 KB
@item https://gnunet.org/sites/default/files/ecrs.pdf
@end table

@node File-sharing persistence directory structure
@subsection File-sharing persistence directory structure

@c %**end of header

This section documents how the file-sharing library implements persistence of
file-sharing operations and specifically the resulting directory structure.
This code is only active if the @code{GNUNET_FS_FLAGS_PERSISTENCE} flag was set
when calling @code{GNUNET_FS_start}. In this case, the file-sharing library
will try hard to ensure that all major operations (searching, downloading,
publishing, unindexing) are persistent, that is, can live longer than the
process itself. More specifically, an operation is supposed to live until it is
explicitly stopped.

If @code{GNUNET_FS_stop} is called before an operation has been stopped, a
@code{SUSPEND} event is generated and then when the process calls
@code{GNUNET_FS_start} next time, a @code{RESUME} event is generated.
Additionally, even if an application crashes (segfault, SIGKILL, system crash)
and hence @code{GNUNET_FS_stop} is never called and no @code{SUSPEND} events
are generated, operations are still resumed (with @code{RESUME} events). This
is implemented by constantly writing the current state of the file-sharing
operations to disk. Specifically, the current state is always written to disk
whenever anything significant changes (the exception are block-wise progress in
publishing and unindexing, since those operations would be slowed down
significantly and can be resumed cheaply even without detailed accounting).
Note that@ if the process crashes (or is killed) during a serialization
operation, FS does not guarantee that this specific operation is recoverable
(no strict transactional semantics, again for performance reasons). However,
all other unrelated operations should resume nicely.

Since we need to serialize the state continuously and want to recover as much
as possible even after crashing during a serialization operation, we do not use
one large file for serialization. Instead, several directories are used for the
various operations. When @code{GNUNET_FS_start} executes, the master
directories are scanned for files describing operations to resume. Sometimes,
these operations can refer to related operations in child directories which may
also be resumed at this point. Note that corrupted files are cleaned up
automatically. However, dangling files in child directories (those that are not
referenced by files from the master directories) are not automatically removed.

Persistence data is kept in a directory that begins with the "STATE_DIR" prefix
from the configuration file (by default, "$SERVICEHOME/persistence/") followed
by the name of the client as given to @code{GNUNET_FS_start} (for example,
"gnunet-gtk") followed by the actual name of the master or child directory.

The names for the master directories follow the names of the operations:

@itemize @bullet
@item "search"
@item "download"
@item "publish"
@item "unindex"
@end itemize

Each of the master directories contains names (chosen at random) for each
active top-level (master) operation. Note that a download that is associated
with a search result is not a top-level operation.

In contrast to the master directories, the child directories are only consulted
when another operation refers to them. For each search, a subdirectory (named
after the master search synchronization file) contains the search results.
Search results can have an associated download, which is then stored in the
general "download-child" directory. Downloads can be recursive, in which case
children are stored in subdirectories mirroring the structure of the recursive
download (either starting in the master "download" directory or in the
"download-child" directory depending on how the download was initiated). For
publishing operations, the "publish-file" directory contains information about
the individual files and directories that are part of the publication. However,
this directory structure is flat and does not mirror the structure of the
publishing operation. Note that unindex operations cannot have associated child
operations.

@node GNUnet's REGEX Subsystem
@section GNUnet's REGEX Subsystem

@c %**end of header

Using the REGEX subsystem, you can discover peers that offer a particular
service using regular expressions. The peers that offer a service specify it
using a regular expressions. Peers that want to patronize a service search
using a string. The REGEX subsystem will then use the DHT to return a set of
matching offerers to the patrons.

For the technical details, we have "Max's defense talk and Max's Master's
thesis. An additional publication is under preparation and available to team
members (in Git).

@menu
* How to run the regex profiler::
@end menu

@node How to run the regex profiler
@subsection How to run the regex profiler

@c %**end of header

The gnunet-regex-profiler can be used to profile the usage of mesh/regex for a
given set of regular expressions and strings. Mesh/regex allows you to announce
your peer ID under a certain regex and search for peers matching a particular
regex using a string. See https://gnunet.org/szengel2012ms for a full
introduction.

First of all, the regex profiler uses GNUnet testbed, thus all the implications
for testbed also apply to the regex profiler (for example you need
password-less ssh login to the machines listed in your hosts file).

@strong{Configuration}

Moreover, an appropriate configuration file is needed. Generally you can refer
to SVN HEAD: contrib/regex_profiler_infiniband.conf for an example
configuration. In the following paragraph the important details are
highlighted.

Announcing of the regular expressions is done by the
gnunet-daemon-regexprofiler, therefore you have to make sure it is started, by
adding it to the AUTOSTART set of ARM:@
@code{
[regexprofiler]@
AUTOSTART = YES@
}

Furthermore you have to specify the location of the binary:
@example
[regexprofiler]
# Location of the gnunet-daemon-regexprofiler binary.
BINARY = /home/szengel/gnunet/src/mesh/.libs/gnunet-daemon-regexprofiler
# Regex prefix that will be applied to all regular expressions and
# search string.
REGEX_PREFIX = "GNVPN-0001-PAD"
@end example

When running the profiler with a large scale deployment, you probably want to
reduce the workload of each peer. Use the following options to do this.@
@example
[dht]@
# Force network size estimation@
FORCE_NSE = 1

[dhtcache]
DATABASE = heap@
# Disable RC-file for Bloom filter? (for benchmarking with limited IO
# availability)@
DISABLE_BF_RC = YES@
# Disable Bloom filter entirely@
DISABLE_BF = YES

[nse]@
# Minimize proof-of-work CPU consumption by NSE@
WORKBITS = 1
@end example


@strong{Options}

To finally run the profiler some options and the input data need to be
specified on the command line.
@code{@ gnunet-regex-profiler -c config-file -d
log-file -n num-links -p@ path-compression-length -s search-delay -t
matching-timeout -a num-search-strings hosts-file policy-dir
search-strings-file@ }

@code{config-file} the configuration file created earlier.@ @code{log-file}
file where to write statistics output.@ @code{num-links} number of random links
between started peers.@ @code{path-compression-length} maximum path compression
length in the DFA.@ @code{search-delay} time to wait between peers finished
linking and@ starting to match strings.@ @code{matching-timeout} timeout after
witch to cancel the searching.@ @code{num-search-strings} number of strings in
the search-strings-file.

The @code{hosts-file} should contain a list of hosts for the testbed, one per
line in the following format. @code{user@@host_ip:port}.

The @code{policy-dir} is a folder containing text files containing one or more
regular expressions. A peer is started for each file in that folder and the
regular expressions in the corresponding file are announced by this peer.

The @code{search-strings-file} is a text file containing search strings, one in
each line.

You can create regular expressions and search strings for every AS in the@
Internet using the attached scripts. You need one of the
@uref{http://data.caida.org/datasets/routing/routeviews-prefix2as/, CAIDA
routeviews prefix2as} data files for this. Run @code{create_regex.py <filename>
<output path>} to create the regular expressions and @code{create_strings.py
<input path> <outfile>} to create a search strings file from the previously
created regular expressions.
